The Journal of . 
Experimental Education 


A periodical report of scientific investigations relating to child development, 
curriculum, learning, teaching, supervision, measurements, 
statistics, and experimental techniques. 


lume XXIV September, 1955 Number 1 
7 1 E / “~~ ж / 
| No 
ү : ` Ў 
| 
БЕ { 
| 
рам ^ 
14) Е $ 
T CONTENTS 
* > Page 
| | A Study of Professional Distances Between the Raters of Teachers and 
| ; Teachers Rated Earl Martin Grotke 1 
| | An Investigation of the New York State Regents Examinations in Science 
| George Greisen Mallinson and Jacqueline V. Buck 43 
| A Comparison of Wechsler Children’s Scale and Stanford-Binet Scores for 
| Eight- and Nine-Year Olds Frank C. Arnold 91 
il 


$150 A COPY 


( 3 Published by Dembar Publications, Inc., 
ie Ν Madison 3, Wisconsin. 
h ν΄ 4“ Entered as second-class matter October 17, 1938 at the 2086 office at Madison, 
| М А =, d Wisconsin, under the act of March 3, 1879. ФА 
”. 


E YEAR —— PUBLISHED QUARTERLY 


EDITORIAL BOARD 


A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


Arthi 5 ild, Professor of Education, Teachers Col- 
iege Calamia University, New York City. _Editorially 
responsible for materials on child welfare, guidance, and 
development, published each December. 


- Johnson, Professor of Education, University of 
Enmeto o Minneapolis, Minnesota. Editorially respon- 
sible for materials on measurements, statistics, and 
methods of experimental research, published each March. 


H. H. Remmers, Professor of Educational Psychology, 
Director Division Educational Reference, Purdue Univer- 
sity, Lafayette, Indiana. Editorially responsible for mate- 
rials on learning, teaching and supervision, published 
each September. 


- Wayne Wrightstone, Director, Bureau of Educational 

J EDES, Board of Education of the City of New York, 
Brooklyn, New York, 110 Livingston Street, Brooklyn, 
New York. Editorially responsible for materials on cur- 
riculum construction, published each June. 


CONTRIBUTING EDITORS 


Emmett A. Betts, Director, Betts Reading Clinic, Haver- 
ford, Pennsylvania. 


- Brueckner, Professor of Education, University of 
ee pee Minneapolis, Minnesota. í ` 


Oscar K. Buros, Associate Professor of Education, Rutgers 
University, New Brunswick, New Jersey. 


Guy T. Buswell, Professor of Educational Psychology, 
п а of Chicago, Chicago, Illinois, 


Harold D. Carter, Associate Professor of Education, Uni- 
versity of California, Berkeley 4, California. 


Leslie L. Chisholm, Associate Professor of Education, 
State College of Washington, Pullman, Washington. 


Herbert S. Conrad, Technical Consultant, Coll 


і еве Епігапсе 
Examination Board, Princeton, New Jersey. и 


Stephen М. Core: Professor of Educational Psychology, 
Universite of Chicago, Chicago, Illinois. 


Robert A. Davis, Professor of Education, Director of 


Bureau of Educational Research, University of Colorado, 
Boulder, Colorado. 


Harl R. quur Director of College of Education, Uni- 
versity of lorado, Boulder, Colorado. 


Harold A. Edgerton, Director, Occupational: Opportunities 
, 


Service, Professor of Psychology, Ohio State University, 
Columbus 10, Ohio. 


John C. Flanagan, Professor of Psychology, University of 
Pittsburgh, Pennsylvania. 


Carter V. Good, Dean, Teachers College, 
Cincinnati, Cincinnati 21, Ohio. 


Robert W. B. Jackson, Assistant Professor of Educational 
Research; Assisi Director, Department of Educational 
Research, Ontario College of Education, University of 
Toronto, Toronto, Canada. 4 


University of 


Harold E. Jones, Professor of Psychology and Director, 
Institute of Child Welfare, University of California, 
Berkeley 4, California, q 


Noel Keys, Professor of Education 


and Lecturer in Human 
Relations, University of Californi. Se 


a, Berkeley, California. 


D. Welty Lefever, Professor of Education, Universi f 
Southern California, Los Angeles, California. LAG 


Edward A. Lincoln, 


c Consulting Psychologist, Най 
Massachusetts. Š ate TN 


Irying Lorge, Professor of Education, Executive Officer, 
Institute οἱ Psychological Research, Teachers College, 
Columbia University, New York 27, New York. 


A. R, Mead, Direct: f Educational Ri h i i 
of "Ylordi, pie oF БЕ cational Research, University 
Florida, 


* Younge Building, Gainesville, 


7T. E. Newland, Lt. Comür. USNR, 2702 Wi 2 
опис, М. W., Washington 7, D. C." Майн Av 
С. W. Odell, Professor of Education, 


mois, Urbana, Illinois. Rss COCA о ш: 


Willard C. Olson, Professor of Education, Director of 


Research in Child Development, University of Michigan, 
Ann Arbor, Michigan. 


Valworth R. Plumb, Chairman, Division of Education and 


Psychology, University of Minnesota (Branch), Duluth, 
Minnesota, 


S. L. Pressey, Professor of Educational Psychology, Ohio 
State University, Columbus, Ohio. 


Clarence E. Ragsdale, Professor of Education, University 
of Wisconsin, Madison, Wisconsin. 


William Reitz, Associate Professor of Education, College 


of Education Examiner, Wayne University, Detroit 2, 
Michigan. 


Henry D. Rinsland, Professor of Education and Director 


of Educational Research, The University of Oklahoma, 
Norman, Oklahoma. 


Robert T. Rock, Jr. Professor of Psychology, Head of 


Dept. of Psychology, Graduate School, Fordham Univer- 
sity, New York City. 


Philip J. Rulon, Professor of Education, Harvard Graduate 
School of Education, Cambridge 38, Massachusetts. 


Douglas E. Scates, Professor of Education, Duke Univer- 
sity, Durham, North Ca: lina. 


John Schmid, Board of Examiners, Michigan State Col- 
lege, East Lansing, Michigan. 


Harold Seashore, Director, Test Division, The Psychologi- 
cal Corporation, New York 18, New York. 


David Segel, Educational Consultant, Specialist in Tests 
and Measurements, Federal Security Agency, U. 8. 
Office of Education, Washington, D. C. 


Paul W. ‘Terry, Professor of Educational Psychology, 
University of Alabama, University, Alabama. 


Helen Thompson, Associate Attending Psychologist, Now 


York Post-Graduate Hospital, 303 East 20th Street, New 
York 3, N. Y. 


Robert L. Thorndike, Associate Professor of Education, 
Teachers College, Columbus University, New York City. 


Herbert A. Toops, Professor of Psychology, Ohio State 
University, Columbus. Ohio. 


Maurice E. Troyer, Director, Bureau of School Services, 
Syracuse University, Syracuse 10, New York. 


Helen M. Walker, Professor of Education, Teachers Col- 
lege, Columbia University, New York City. 


Beth L. Wellman, Professor of Psychol Child Welfare 
Research Station, State ‘Universi ОРЕ та, Iowa City, 
owa. 

Guy M. Wilson, Emeritus Professi f Education, Boston 
University, 33 Pine Street, Wellesley Hilo. Massa- 


Paul A. Witty, Professor of Education, Director of Psycho- 
Educational Clinic, School of Education, Northwestern 
University, Evanston, Illinois. 


Ernest R. Wood, Prof, i Uni- 
ο ης York ae of Education, New York 


D. A. Worcester, Chairman Department of Educational 
Psychology and’ Measure: , University e 
TAN REM casurement, University of Nebraska, 


DEMOCRAT PRINTING COMPANY 
MADISON, Wisconsin 


= 


The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 
curriculum, learning, teaching, supervision, measurements, 
statistics, and experimental techniques. 


Volume X: 
| XIV December, 1955 Number 2 


CONTENTS 
Page 


The Effects of a “Causal” Teacher-Training Program and Certain Curricular 
Changes on Grade School Children 
Ralph Η. Ojemann, Eugene E. Levitt, William Н. 
Lyle, and Maxine F. Whiteside 95 


The Selection of @andidates for Teacher Education at the University of Wisconsin 
Gustave John Stoelting 115 


Differential Methods of Solving Selected Problems on the Ace Psychological 
Examination Leone Anderson, Richard Rankin, Joy Richardson, 
Julius Sassenrath, and Julius Thomas 133 


Academic Attrition of Engineering Transer Students Т. Stanley Abmann 141 
College Level Study Skills Programs: Some Observations Walter S. Blake 147 


PUBLISHED QUARTERLY $1.50 A COPY 


Published by Dembar Publications, Inc., 
Madison 3, Wisconsin. 
the post office at Madison, 


al PN Entered as second-class matter October 17, 1938 at 
4 Wisconsin, under the act of March 3, 1879. 


EDITORIAL BOARD ` 


A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


Arth " ild, Professor of Education, Teachers Col- 
lege; ышана "University, New York City. Editorially 
responsible for materials on child welfare, guidance, and 
development, published each December. 


. Johnson, Professor of Education, University of 
чысы ызы ДЫ, Minnesota. Editorially respon- 
sible for materials on measurements, statistics, and 
methods of experimental rescarch, published each March. 


H. H. Remmers, Professor of Educational Psychology, 
Director Division Educational Reference, Purdue Univer- 
sity, Lafayette, Indiana. Editorially responsible for EE 
rials on learning, teaching and supervision, publishe 
each September. 


i i ional 
- Wayne Wrightstone, Director, Bureau of Educational 

9 RES Board of Education of the City of New York, 
Brooklyn, New York, 110 Livingston Street, Brooklyn, 
New York. Editorially responsible for materials on cur- 
riculum construction, published each June. 


CONTRIBUTING EDITORS 


Emmett A. Betts, Director, Betts Reading Clinic, Haver- 
ford, Pennsylvania. 


Leo J. Brueckner, Professor of Education, University of 
Minnesota, Minneapolis, Minnesota. 


Oscar K. Buros, Associate Professor of Education, Rutgers 
University, New Brunswick, New Jersey. 


Guy T. Buswell Professor of Educational Psychology, 
University of Chicago, Chicago, Illinois, 


Harold D. Carter, Associate Professor of Education, Uni- 
versity of California, Berkeley 4, California. 


Leslie L. Chisholm, Associate Professor of Education, 
State College of Washington, Pullman, Washington. 


Herbert S. Conrad, Technical Consultant, College Entrance 
Examination Board, Princeton, New Jersey. 


Stephen M. Corey, Professor of Educational Psychology, 
niversity of Chicago, Chicago, Illinois, 


Robert A. Davis, Professor of Education, Director of 


Bureau of Educational Research, University of Colorado, 
Boulder, Colorado. 


Harl R. Douglass, Director of College of Education, Uni- 
versity of Colorado, Boulder, Colorado. 


Harold A. Edgerton, Director, Occupational Opportunities 


Service, Protessor of Psychology, Ohio State University, 
Columbus 10, Ohio, Ἢ h 


John C. Flanagan, Professor of Psychology, 
Pittsburgh, Pennsylvania. 


Carter V. Good, Dean, Teachers College, University of 
Cincinnati, Cincinnati 21, Ohio. 


Robert W. B. 


University of 


Jackson, Assistant Professor of Educational 
Research; tant Director, Department of Educational 


Research, io College of Education, University of 
"Toronto, Toronto, Canada. $ 


o» 


Harold E. Jones, Professor of Psy 
Institute of Child Welfare, 
Berkeley 4, California. 


Noel Keys, Professor of Education 
Relations, University of Californi: 


ychology and Director, 
University of California, 


and Lecturer in Human 
a, Berkeley, California. 


D. Welty Lefever, Professor of Education, U; 


1 іуегві Е 
Southern California, Los Angeles, California "i о 


Edward A. Lincoln, Consulting Psychologist, Halifax, 
Massachusetts, 
: Professor of Education, Executive Officer, 
Institute oi Psychological Research, Teachers College, 
New York 27, New York. 


of Educational Research, 
. Younge Building, 


A. R. Mead, Directo: 
of Florida, 330 P. 


University 
Florida, 


Gainesville, 


T. E. Newland, Lt. Comdr., USNR, 
enue, N. W., Washington 7, D. C. 


C. W. Odell, Prof : 
Чом) быа; Himos, of Education, University of Ili- 


» 2702 Wisconsin Ay- 


Willard C. Olson, Professor of Education, Director of 
Research in Child Development, University of Michigan, 
Ann Arbor, Michigan. 

Valworth Б. PI 
Psychology, 
Minnesota. 


umb, Chairman, Division of Education and 
University of Minnesota (Branch), Duluth, 


S. L. Pressey, Professor of Educational Psychology, Ohio 
State University, Columbus, Ohio. 


Clarence E. Ragsdale, Professor of Education, University 
of Wisconsin, Madison, Wisconsin. 


William Reitz, 
of Education 
Michigan. 


Associate Professor of Education, College 
Examiner, Wayne University, Detroit 2, 


Henry D. Rinsland, Professor of Education and Director 
of Educational Research, The University of Oklahoma, 
Norman, Oklahoma. 


Robert T. Rock, Jr., Professor of Psychology, Head of 
Dept. of Psychology, Graduate School, Fordham Univer- 
sity, New York City. 


Philip J. Rulon, Professor of Education, Harvard Graduate 
School of Education, Cambridge 38, Massachusetts. 


Douglas E. Scates, Professor of Education, Duke Univer- 
sity, Durham, North Carolina. 
о 


John Schmid, Board of Examiners, Michigan State Col- 
lege, East Lansing, Michigan, 


Harold Seashore, Director, Test Division, The Psychologi- 
cal Corporation, New York 18, New York. 


David Segel, Educational Consultant, 
and Measurements, Federal Security Agency, U. 
Office of Education, Washington, D. C. 


Paul W. Terry, Professor of Educational Psychology, 
University of Alabama, University, Alabama. 

Helen Thompson, Associate Attending Psychologist, New 
York Post-Graduate Hospital, 403 East 20th Streot, New 
York 3, N. Y. 

Robert L. Thorndike, Associate Professor ot Education, 
Teachers College, Columbus University, New York City. 

Herbert A. Toops, Professor of Psychol Ohio State 
University, Columbus. Ohio, 7 ον, 

Maurice E. Troyer, 


» Specialist in Tests 


Tre Director, Bureau of School Services, 
Syracuse University, Syracuse 10, New York. 


Helen M. Walker, Professor of Education, Teachers Col- 
lege, Columbia University, New York City. 


Beth L. Wellman, Professor of P. ] Child Welfare 

Research Station, State Universe ok Twa, Iowa City, 

a. | 

eu M. Wilson, Emeritus 

niversity, 33 Pine S 
chusetts, 


Paul A. Witty, Professor of Education, Director of Psycho- 
Educational ‘Clinic, School of Education, Northwestern 
Univ ity, Evanston, Illinois. 


Ernest R. Wood, Prof, k Uni- 
versity, New Yong мес of Education, New Yor 


Professor of Education, Bi 


ton 
treet, Wellesley Hills, Ma! 


D. A. Worcester, Chairman Departm f Educational 
Psycholo; дан = ent o jucatiol 3 
Lincoln, Fal ad Measurement, niversity of Nebraska, 

BEMOORAT 


PRINTING compan 
MADISON, WISCONSIN Н 


Journal of Experimental Educatioti 


Volume XXIV 


September, 1955 


Number 1 


A STUDY OF PROFESSIONAL DISTANCES 
BETWEEN THE RATERS OF TEACHERS 
AND TEACHERS RATED* гү 


EARL MARTIN GROTKE 
University of Southern California 


SECTION I 


The Problem 
. THIS STUDY attempts to show the rela- 
tionship between the attitudes of raters and ra- 
tees and the ratings given to teachers. It is hy- 
Pothesized that the lengths of **professional dis- 
tances” between raters and ratees increase as 
teacher ratings decrease from good to average 
апа average to poor. 
Definition of Professional Disiance. —T he 
Concept of professional distance is adaptedfrom 
concept of social distance used in the field 
ΟΙ sociology. In 1925, while studying rac ial 
attitudes, Bogardus devised an attitude scale 
Called social distance. ‘‘He was interested in 
сисавигше degrees to which individual repre- 
w ntatives of various racial and national groups 
i ere accepted or rejected.... Instead of mak- 
ali a distinction between favorable and unfavor" 
D attitudes, however, he conceived the prob- 
"e in terms of degrees of ‘distance’ which his 
ubjects wished to keep between themselves and 
rigen of other groups. The more unfavor- 
€ the attitude, from this point of view, the 
fee the social distance, and the more favor- 
Thu; the attitude, the less the social distance. 
frien! the social distance between two intimate 
the nds would be zero, and at the other extreme 
woul ttitude of a rabid anti-Semite toward Jews 
(sor ert maximum social distance. 
bos applied to the field of teacher evaluation, 
Бас mal distance refers to the frequency ОГ 
ess. ements and divergency between two pro, 
ева Па Workers on what constitutes the ΡΟ”. 
al 5.0181 role of the good teacher. Profession- 
Che may be illustrated simply 25 tollows: 
pils А believes a teacher should keep her p 
ibsolutely quiet during class time; Worker 
able fic Ves pupils should be permitted consider. 3 
ments ®edom. Such disagreements ( and авг a 
) on the professional role of the good teac 


Aro 
m th> authoris PhD. dissertations 


University of Wisconsin, 


er constitute professional distance. As one in- 
creases the number of disagreements between 
any two professional workers, the professional 
distance increases. The greater the frequen- 
cies, the longer the professional distance. 

The second aspect of the definition of pro- 
fessional distance suggests the measurement 
of the degree of divergency between the points of 
view of two professional workers. This aspect 
may be illustrated by Worker A, stating that she 
definitely believe a good teacher stands in front 
of the class when teaching; Worker B says she 
has no convictions on this issue, and Worker C 
says she definitely believes a good teacher stands 
in the rear of the room. Sucha disagreement 
suggests that the professional distance is longer 
between Workers A and C than it is between A 
and В ог B and C. The more divergent the 
points of view, the longer the professional dis- 
tance; the less divergent the points of view, the 
shorter the professional distance. 

Definition of the Professional Role of the 
Good Teacher. — The concept of the profession- 
al role of the good teacher also is an adaptation 
from the field of sociology. It is adapted from 
the term ‘‘social role” which Cuber defines as 
«the culturally defined patterns of behavior ex- 
ed or required of persons in specific social 


ect 
positions. ...behavior as used in this definition 
includes. . . . both overt actsandcovertbehaviors 


such as attitudes, values, and ideas. ” (10:232) 
Professional role may be similarly defined as 
the professionally determined behaviors expect- 
ed or required of persons ina sepcific profes - 
sional position, i.e., the position of classroom 
teacher. The role of teacher requires both co- 
vert and overt behaviors. In general, the covert 
behaviors may include desiring to teach, know- 
ing subject matter, and believing in democracy. 
viors are conducting lessons, 


ert behav; ‹ 
ЖЫ А teaching tools, and preparing re- 


manipulating 
ἘΣ i ired of per- 
the behaviors require 
When se fessional role of the good 


19925 Λο Se Barr, advisor. 


TABLE I 


SOME CORRELATIONS BETWEEN CRITERIA AND OTHER BEHAVIORS 


Criteria Other Behaviors · Correlations Researcher 
Pupil Change Supervisory Rating . 36 Rolfe 
Practice Teaching Score . 13 Jones 
College Grades (4 yr. G. P. A.) -. 08 Jones 
Tests (American Council Psy- 
chological Examination) -.10 Rolfe 
Supervisory Rating Practice Teaching Grades . 69 Bossing 
College Grades „19 Bossing 
Tests (National Teachers 
Examination) „51 Flanagan 
Practice Teaching Supervisory Ratings , 69 Almy-Sorenson 
Grades College Grades . 49 Almy-Sorenson 
Teaching Aptitude „19 Seagoe 
Pupil Ratings College Grades . 03 Lins 
Tests (American Council Psy- 
chological Examination) Ee Lins 


*All data for Table I is from A. S. Barr, ‘‘Measurement and Prediction of Teaching Effic- 
iency: A Summary of Investigations, " Journal of Experimental Education, XVI (June 1948). 


--ωωΏς ο... — 


копуопая 'IVILNSIADISdXMX AO ΤΥΝΗΠΟΓ 


Бг Ιολ) 


лалы аан — ы. — 8 


September, 1955) 


teacher in contrast to the professional role of 
the poor teacher, one synthesizes all the best 
illustrations of “good” teaching he has seen ог 
heard or read about. Teacher practices such as 
keeping the children absolutely quiet during class 
time, having them fold their hands while they lis- 
ten to the teacher, and measuring the results of 
learning exclusively with standardized tests may 
be learned as ‘‘good”’ activities. On the other 
hand, dividing the class into small groups to 
work on individually chosen tasks, permitting as 
much freedom as possible, and measuring the ге- 
sults of learning by observing coope rative be- 
havior patterns may be learned as “good” be- 
haviors by a second professional wo rker who 
may be a rater of teachers. Likewise the per- 
sonal traits that “good?” teachers have, the mod- 
ulated voice, the social poise, the ethical stand- 
ards, all are learned by each professional work- 
ег to form his own concept of the ‘‘good”’ teacher 
and required of any person who would play the 
role of his ‘‘good’’ teacher. 


How Concepts of the Professional Role of the 
Good Teacher Functions in Teacher Evaluation. 
— ‘Appraisal of any kind may be defined as an 
act of judgment, in which the judging implies 
both a criterion — standard of some sort—and 
a pertinent description of what is being j udged. ”? 
(18:172) In the field of teacher evaluation the 
criterion is one’s concept of the professional 
role of the good teacher or some aspect of it; 
the pertinent description is a concept of the teach- 
er being judged, or some aspect of her perform- 
ance in the role of teacher. When a teacher 
evaluates herself, she compares what she 
thinks she is to what she thinks she should be, 
i.e., her concept of the professional role of the 
good teacher. As a result of her comparison, 
She arrives at a qualitative and/or quantitative 
expression representing the distance between 
her two concepts. When evaluations are made 
by a person other than the teacher, the evalu- 
ator compares his concept of the teacher's per- 
formance with his concept of the profes sio nal 
role of his “good” teacher. His comparison al- 


“so results in a qualitative and/or quantitative 


expression representing the distance be tween 
his two concepts. From this point of view, all 
measurement can be thought of as expressions 
of distance between the criterion and the con- 
cept of what is being evaluated. Concepts ofthe 
professional role of the good teacher functionas 
the criterion in teacher evaluation. 

sional Distance Func- 


. How Concepts of Profes | 
tion in Teacher Evaluation— When professional 
distance—that is, disagreements between two 
Professional workers on what constitutes the pro^ 


* : described in de 
Instruments used in this study are 45 orig thas 


asuring pro 


dices A through 0 which will be found in or 
Sin, Madison, Wisconsin. Procedures for 


GROTKE 


fessional role of the good teacher—exists be- 
tween the teacher and her evaluator, their evalu- 
ations are likely to be different because their 
criteria are different. Doing an excellent job of 
teaching in the eyes of the teacher is approximat- 
ing her own concept of good teaching. If her con- 
cept of good teaching is decidedly different from 
the concept of good teaching held by her rater, 
i.e., the professional distance between them is 
long, the evaluation that the rater may give her 
is apt to be poor. If, on the other hand, her con- 
cept of good teaching is similar to that of her rat- 
er, i.e., the professional distance between them 
is short, the evaluation that the rater may give 
her is apt to be good. Thus it is hypothesized 
that the length of professional distance increases 
as teacher ratings decrease from good toaverage 
and from average to poor. 

The Measurement of Professional Distance 
— Professional distance suggests the comparison 
of the concepts of the professional role of the good 
teacher as held by any two professional workers. 
To make such comparisons instruments were con- 
structed to ascertain the overt and covert Бе hav- 
iors each professional worker expects from his 
“good” teacher. Specific teaching prac tices, 
teacher factors, and beliefs related to education 
were selected to appear on the instruments. Sub- 
jects were asked to respond by classifying each 
practice as good or poor; each factor asimport- 
ant or in significant; and each statement of belief 
as ones which they definitely believe or ones that 
they definitely do not believe. Stepintervals 
were provided for indicating in between positions. 
Since distances between the teachers andtheir rat- 
er were sought, comparisons were made between 
the responses of the teacher and the responses 
of their raters. For each item on each instru- 
ment the distance between the two response S 
was assigned a weight value. To arrive at the 
total distance measured by the instrument, the 
weight values for all the items of that instru- 
ment were summed. * 

Conditions Under Which the Hypothesis Willbe 
Considered Substantiated — If the professional 
distance scores are lowest for the teach- 
ers rated good, and higher for the teachers rat- 
ed average, and highest for the teachers rated 
poor, then the hypothesis will be considered to 
be substantiated, and professional distance as 
measured by these instruments may be consid- 
ered as an indicator of professional ratings. If 
professional distance scores appear insome oth- 
er pattern, the hypothesis will be considered a S 
not supported by the evidence. 

Summary—This study attempts to s how the 
pattern of the lengths of professional distance a5 
they exist between raters and the teachers they 


tail in Section III. Copies of them appear аз Apen- 
is on file in the Library, University of Wiscon= 
fessional distances are described in Section IV. 


4 JOURNAL OF EXPERIMENTAL EDUCATION 


rate. Professional distance is definedas the 
number and divergency of the disagreements be- 
tween the concepts held by two professional work: 
ers on what constitutes the professional role of 
the good teacher. Each worker learns from his 
own unique sequence of experiences his concept 
of the professional role of his “good” teacher. 
One's concept of the professional role of the good 
teacher is used as a criterion to evaluate one's 
own teaching and the teaching of others. When 
rating others, the resultant evaluations proba- 
bly vary from good to poor as professional dis - 
tances vary fromshort to long. Specialinstru- 
ments and procedures are used to measure pro- 
fessional distance. 


SECTION II 
The Method of Research 


A MODIFIED form of the casual-compar- 
ative method of research was employed in this 
Study. Two phenomena were investigated: one, 
a teacher considered a good teacher; the other, 
a teacher considered a poor teacher. The first 
modification recognized a middle group, called 
average, and believed to be between the two ex- 
tremes.* Therefore, the absence of the first 
phenomenon was teachers rated average or poor; 
the absence of the second phenomenon was teach- 
ers rated average or good. The Second modifi- 
cation assumed that circumstances attending the 
presence of the phenomena may exist in degrees, 
i.e., lengths of professional distance. 

Some arbitrary limits were made for the 
Study. All Subjects were selected from elemen- 
tary school faculties. Teachers in the group 
studied taught between grades one and six, 
Whether the relationshi; 


faculti t a part of the Study. An- 
other limitation was made by definition, The 


the teacher was limited to 


mmunity with a 
100, 000 was located 
When the 


(Vol. 24 


elementary schools for Anglo- and LatinAmer- 
ican children. Two of the 19 schools were not 
accepted for the study: one had been establish- 
ed only a few weeks before the study was begun; 
the second was staffed by teachers who the 
principal felt could not be considered as either 
good or poor. Of the 17 schools selected, 15 
were administered by their own principals. The 
other two were administered by one principal 
who felt competent to rate the teachers in both 
schools. 

The second community had a population of 
approximately 70, 000 and was located on the 
coast of Lake Michigan. Of the 14 elementary 
grade schools, 13 were selected for the study. 
One was not accepted because the teachers 
failed to cooperate. Among the 13 acc epted, 
one elementary school was housed in a building 
that also housed classes for orthopedic and 
mentally handicapped children. The prin- 
cipal in this elementary school was administrat- 
or for all divisions in his building. Two ele- 
mentary schools were housed in buildings along 
with junior high schools. The principals ofthe 
elementary schools were also principals of the 
junior high schools. Six small schools weread- 
ministered by three principals, each principal 
Serving as the head of two schools. Each of 
these three principals felt competent to rate 
the faculties in each of his Schools. Each of 
the other schools wàs administered by its own 
principal. With the Schools from the first com- 
munity, the total group for study consisted of 
30 elementar Y school faculties administer- 
ed by 26 Principals, 

The principals of each of the 30 schools 
Served as the raters of their teachers. Facul- 
ties of the schools ranged from 6 to 47 teachers. 
Each principal was asked to select from his 


Staff(s) one of the best teachers, one of his av- 
erage teachers, 


tive teachers, 


no claim {| 
Same basis 


j————HÁ! 


September, 1955) 


er practices as ‘‘good’’, “роог”, or «makes 
no difference’’. On the second instrument the 
Subjects classified teacher factors ona five 
Point Scale from “оѓ utmost importance” to 
ο ο On the last instrument the 
са jects indicated their pattern of beliefs relat- 
ed to education. The instruments are described 
in detail in Section III. 
za each case the subject's cooperation was 
the Б and received. None of the group was told 
was ypothesis being studied. Sufficient time 
his E ven to permit each person to respond at 
aS leisure, Instructions on the mthod of re- 
ponse appeared on each measuring device. The 
qualifications of the persons and their responses 
Mu ena that the instruments were not misin- 
ov preted. When omissions were conside red 
th ersights, the subjects were asked to complete 
thr instrument, Omissions of one principaland 
Е ее teachers, however, were due to a differ- 
fors in point of view. In these cases it was In^ 
mio from their comments on the margin that 
еу considered the items they omitted as “по! 
oe any difference". Their omitted respon" 
b. Were considered as such. These were few 
So Number, One requirement specified that per- 
Ὁ Pia responding to the instruments would not 
the ae rse about the study before or during 
the data collecting. Upon collecting the instru- 
a from the schools, information on compli- 
indie With this request was asked. Responses 
icated cooperation. 
әде study sought the relationship of profes- 
a distance to teachers’ ratings. To deter- 
Sean professional distance, the responses of 
tea, rater were compared with those of the 
"i uide he rated. Weightvalues wereassigned 
assi 1 disagreements. Three analyses of the 
Signed weights were made. Professional dis- 
ies Scores were computed by summing the as- 
ατομα weights. Frequencies of disagreement 
of EE were computed by counting the number 
Ssigned weights. Item analyses Were made 
que Omputing the professional distance and fre- 
Ses fO. of disagreement for each group of teach- 
дер ог each item. Scores were compared to - 
еле whether they substantiated the hypoth 


SECTION Ш 


Measuring Instruments 


THREE DATA gathering devices were 
so Structed and used in the study. They all 
thes ht to determine the subjects’ concepts of 
tere tessional role of the good teacher, 1n 
and be of teacher factors, teaching pra 
eliefs related to education. * 


со; 


ctices, 


ΧΟ 
velles of instruments will be found in Ap 
sity of Wisconsin. 


pendices А, В, Cy 


GROTKE 5 


The Evaluation of Teaching Practices.—T he 
first of the three instruments dealt with teaching 
practices. Fifty-one of the practices appearing 
on the instrument were extracted from Table XLI, 
A Summary of Theory and Practice in Teaching 
Social Science, in A.S. Barr’s Characteris tic 
Differences in the Performance of Good and Poor 
Teachers of the Social Studies. (2:1004) The table 
includes data on the number of experts who con- 
sider the practice as good and also the number of 
experts who consider the practice as poor. In the 
construction of the instrument only those prac- 
tices were selected on which the experts showed 
a marked degree of disagreement. Those prac- 
tices on which the minority group of experts equal- 
led ten or more percent of the majority group 
made up the first 51 items for theinstrument. 
To this number of items were added the following 
two, which seemed to be controversial: 


Measures results of learning by changed 
attitudes and behaviors; and 

Measures results of learning by quality of 
pupils’ projects and exercises. 


Subjects were asked to categorize each of the 
total of 53 practices as *tgood"', as “poor”, ог 
as “making no difference", i.e., neither good 
nor poor. The various methods of scoring" 
the instrument are described in the following 
chapter, Analysis of Data. 

The Evaluation of Teacher Factors.— An in- 
strument to obtain the subjects’ ranking of the 
importance of specific teacher factors was con- 
structed by using the teacher factors that appear 
on the official rating scale used by the school 
system in the first community studied. The auth- 
or of this study was also the author ofthe rating 
scale. Twenty-five teacher factors appearing 
on both devices are divided into four classifica- 
tions; (1) the teacher as a person; (2) the 
teacher as a director of learning; (3) the teacher 
as a friend and counselor of students; and (4) the 
teacher as а member of a professional staff. 
Such items as ‘ * physically fit'', **emotion- 
al stability’’ and ‘‘good speaking voice’’ appeared 
in the first classification. «Establishes attain- 
able goals cooperatively with students’’, Has mas- 
tery of subject тайег”, and Skillful with a vari- 
ety of tests and measurement devices” appeared 
in the second classification. ‘‘Builds a sens e of 
security and personal worth in all students", and 
«Considers the development of the child asan in- 
dividual more important than subject matter mas~ 
tery” appeared in the third. “Guided by profes- 
sional ethics” and ‘‘Actively cooperates instaff 
operations?” appeared in the last classification. 
The subjects were asked to categorize eachof the 
25 factors into one of five categories: (1) utmost 


in original thesis, Library of the Uni- 


6 JOURNAL OF EXPERIMENTAL EDUCATION 


major importance, (2) very important, (3) impor- 
tant, (4) usable, but not important, and (5) in- 
significant. It was assumed that those fact ors 
deemed important were those that the subje ct 
required of the person who plays the role of his 
good teacher. The ‘‘scoring’’ techniques used 
for this instrument are described in Section V. 

The Inventory of Beliefs. — An instrument 
to determine which beliefs related to education 
were held by the subjects was constructed for 
this research. It consisted of 12 groups of state- 
ments of beliefs. Each group contained 10 state- 
ments. The names of the groups were Teacher- 
Pupil Relationships, Teaching Profession, Com- 
munity Relationships, Objectives of Educa tion, 
The Schools’ Stand on ControversialIssues, Mi- 
nority Groups, Democracy and Government, Ec- 
onomic Problems, Organized Labor, Reli gion, 
and Life Values. Each of the 120 statements be- 
gan with the words, “1 believe that... . ” Illustra- 
tions of the items are: 


I believe that the public schools have an 
obligation to provide sex education, 

I believe that the teachers who actively 
work for social and economic reforms are 
poorer teachers than those who stick to their 
own subject matter fields. 

I believe that another world war will come 
eventually, regardless of the Steps we take to 
prevent it. 

I believe that it is un-American to peace- 
fully advocate that the American Government 
Should operate all Steel, mining, transporta - 
tion, and manufacturing industries. 

I believe that every person will set aside 


his principles when the rewards for doing so 
are high enough, 


Subjects were asked to indicate on а Special 


ns to the statements as 


my mind; (4) Iam incli 
Statement; and (5) No, 
this statement. On t 
assumed that the subject’ 
he required for the profe 
teacher, 


The reliability of the inventor 


y was studied by 
means of a test-re i 


"test procedure, Thirty-four 


The average numb 
which the class 
later was 73.8. 


The average number of the 120 items on 


er of the 120 items on 
gave identical answers а week 


(Vol. 24 


which the class members reversed them 
selves was 15.8. 


**Reversing themselves” was defined as indicat- 
ing *'Definitely believing ” or ‘Inclined to be А 
lieve” during one responding period and indicat 
ing ‘Definitely not believing" or ‘Inclined not to 
believe" the same item during the other respond- 
ing period, Changes in response may be attrib- 
uted to the effectiveness of the instructions dur 
ing the intervening week, or to the degree of re- 
liability of the instrument. Methods for'*scor- 
ing" the instrument are described па later 
Section of this report. UC CT 
Summary. —The three data gatheri 
vices used in this study have been described. 
They are (1) the Evaluation of Teaching Ргас- 
tices, (2) the Evaluation of Teacher Factors, 
and (3) the Inventory of Beliefs. Allsought 
the subjects’ concept of the professional role of 
his good teacher. Methods for “scoring” the 


instruments are described in a later Section of 
this report, 


SECTION IV 


Analysis of Data and Conclusions 


AS HAS already been said, a modified 
form of the causal-comparative method of re- 
Search was employed. It was hypothesized that 
the lengths of professional distances increase 
as teacher ratings decrease from good to ауег- 
age and from average to poor. Professional 
distance is suggested by the frequency and di- 
vergency of disagreements between the points 
of view on what constitutes the professional role 
of the good teacher. The greater the frequency 
of disagreements, the greater the professional 
distance; the greater the divergency of disa- 
&reements, the greater the professional dis- 
tance. 

Professional distance “scores” are com- 
puted for the number and divergency of disa- 
£reements between each rater and the teachers 
he rated. Scores are associated with the 
teachers, such as: The score of the teacher 
rated good is 145, Actually, the score is not 
the teacher's any more than it is the rater’ sS, 
Since it signifies the extent of the disagree- 
ments between them, However, for conven- 
ience, throughout this discussion, the rather 
lengthy expression, «the score for the distance 

tween the rater and the teacher he rated 
good”; is abbreviated to * A's score”. Like- 
wise, ‘‘the score for the distance between the 
rater and the teacher he ratedaverage’’ is ‘‘B’S 
Score", and “the score for the distance be” 
tween the rater and the teacher he rated poor” 
is “C’s score", 

If the professional distance scores are low- 


September, 1955) GROTKE 


TABLE II 
PROFESSIONAL DIS? ANCE SCORES FOR TEACHER PRACTICES 
š Code No. Teacher Teacher Teacher 
School Rated Good Rated Average Rated Poor 

1, 22 30 18 
2. 28 17 24 
3. 21 30 30 
4. 28 36 36 
5. 36 36 24 
. 6. 25 26 27 
7. 30 42 40 
| 8. 30 35 36 
| 9. 10 26 44 
10. 41 30 51 
11. 22 27 34 
12 ° 33 39 32 
13. 30 31 33 
14. 16 25 19 
15. 31 29 31 
16. 25 29 32 
17. 34 34 29 
21. 24 32 42 
22. 27 25 35 
| 23. 34 29 26 
| 24, 38 42 37 
25. 25 34 37 
26. 23 29 27 
21. 39 24 27 
28. 27 22 26 
30. 33 30 34 
31. 24 35 29 
32. 27 18 29 
33. 22 33 32 
34 26 28 42 


o each school to assure their anonymity. 


*Code numbers were assigned t 
It may be reported, however, that the numbers 1 through 17 represent 
the first community studied, and 21 through 34 represent the second 


community. The school in the second community that was to be desig- 
| nated number 29 was eliminated for reasons explained in Section H 
under the heading «Application of the Plan of Research”. 


est for the A teachers, higher for the B teach- 
ers, and highest for the C teachers, then the 
hypothesis will be considered as supported by 
the evidence. If the professional distance 
Scores appear in some other pattern, then the 
hypothesis will be considered as not supported 
by the evidence. 

This section is divided into three parts. Part 
One reports the analysis of data for profession- 
al distance (frequency and divergency of disa- 
greements). Part Two reports an analysis for 
frequency of disagreements, without regard for 
their divergency. Part Three reports on item 
analysis for critical items. In each part the 
three instruments are analyzed separately, 


Part One: Professional Distance 
та Une: +rotessional Distance 


Teacher Practices. —The instrument for 
measuring professional distances for teacher 
practices consisted of 53 items which the sub- 
jects classified as “воод”, “poor”, or “makes 
no difference", i.e., neither good nor poor. 
The responses of each teacher were co m pared 
with those of her rater. Differences in their re- 
Sponses were assigned the following weights: 


Weight of 2: One professional worker classify- 
ing the practice as good; theother 
worker classifying it as poor. 

Weight of 1: One professional worker classify- 
ing a practice as making no differ- 
ence; the other classifying it as 
good or as poor, 


shown in Table II 


S in 16 of the 
9 schools, and longer for C teachers than for 
either A or B teachers in 13 of the 30 schools, 


In 20 schools it is Shorter for A teachers than 
for C teachers, 


The following 
differences between 
tatement: 


Weight of 4: One party definitely believing; the 
other, definitely not believing, 


JOURNAL OF EXPERIMENTAL EDUCATION 


Weight of 3: One party definitely believing; the 
other, inclined not to believe. 

Weight of 3: One party definitely not believing; 
the other, inclined to believe. 

Weight of 2: One party inclined to believe; the 
other, inclined not to believe. 

Weight of 2: One party responding that he can- 
not say; the other, either definite- 
ly believing, or definitely not be- 
lieving. 

Weight of 1: One party definitely believing; the 
other, inclined to believe. 

Weight of 1: One party definitely not believing; 
the other, inclined to not believe. 

Weight of 1: One party responding that he can- 
not say; the other, either inclined 
to believe, or inclined to not be- 
lieve. 


The assigned weights were summed to deter- 
mine the professional distance score for beliefs. 
These are shown in Table III. 

Professional distance for beliefs is shorter 
for A teachers than for either B or C teachers 
in 11 schools of the 30 studied. It is longer for 
C teachers than for either A or B teachers in14 
schools. In 19 schools, professional distance 
for A teachers is shorter than that for C teach- 
ers. 

A second analysis of the belief inventory was 
made in which only'assigned weights for differ- 
ences that suggested opposition were summed. 
Such differences were those assigned weights of 
2 in which one professional worker was inclined 
to believe and the other worker was inclined not 
to believe, The resultant Scores, representing 
Oppositional Professional Distance for Beliefs, 
are shown in Table IV. 

Oppositional professional distance for beliefs 
is shorter for A teachers than for either B or C 
teachers in 14 schools, It is longer for C teach- 
ers than for either A or 
In 19 schools it is Shorter for the A teachers 
than it is for the C teachers, 

Teacher Factors. — The instrument to meas- 
ure professional distance for teacher factors 
Consisted of 25 items found on the teacher rat- 
ing scale of the first community studied, Sub- 
Jects were asked to rank the importance of each 
factor on the following five-point scale: (1) of 
utmost importance; (2) very important; (3) im- 
portant; (4) usable, but not important; (5) insig- 
nificant. The responses of each teacher were 
Compared with those of her rater, and the dif- 
ferences between each of their responses were 
assigned the following weights: 


Weight of 4: One party ranking the factor of ut- 


most importance; the other, as in- 
significant, 


Weight of 3: One party ranking a factor as ut- 


(Vol. 24 


B teachers іп 17 schools. 


September, 1955) GROTKE 


TABLE III 


PROFESSIONAL DISTANCE SCORES FOR BELIEFS 


Code No. Teacher Teacher Teacher 
School Rated Good Rated Average Rated Poor 

1. 171 195 210 

2. 167 140 140 

8. 148 129 168 

4. 127 171 179 

5. 157 179 201 

6. 181 155 175 

Te 176 158 138 

8. 211 149 162 

9. 180 172 203 
10. 185 179 207 
11. ο 146 173 186 
12. 164 176 159 
13. 143 133 142 
14. 114 114 143 
15. 140 197 176 
16. 161 160 182 
17. 182 174 165 
21. 165 174 187 
22. 190 176 194 
23. 145 199 155 
24. 229 215 193 
25. 188 183 189 
26. 157 179 159 
21. 139 118 148 
98. 102 114 193 
30. 136 185 131 
31. 179 179 161 
32. 153 1.7 201 
33. 165 181 176 


10 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 


OPPOSITIONAL DISTANCE SCORES FOR BELIEFS 


School Teacher Teacher Teacher 
Code No. Rated Good Rated Average Rated Poor 
1. 99 133 162 
2. 102 83 78 
3. 93 60 115 
4, 78 102 120 
5. 75 84 109 
6. 105 92 127 
To 130 124 88 
8. 158 18 πο 
9. 101 19 111 
10. 155 161 187 
TL 92 101 119 
12. 112 105 97 
13. 104 86 92 
14. 50 64 65 
15. 104 156 138 
16. 97 101 124 
17. 126 136 156 
21. 100 118 123 
22. 158 99 150 
23. 84 139 92 
24. 203 129 147 
25. 160 109 167 
26. 113 156 90 
27. 73 78 а 
28, 53 124 13 
30: 16 130 š 
ἘΣ 118 113 Т 
32. 131 101 : 
33 176 
m ni 125 13 


(Vol. 24 


September, 1955) 


most importance; the other, us- 
able but not important. 

Weight of 3: One party ranking the factor very 
important; the other, insignificant. 

Weight of 2: One party ranking the factor as 
important; the other, either of ut- 
most importance, or insignificant. 

Weight of 2: One party raning the factor as very 
important; the other, usable but 
not important. 

Weight of 1: One party ranking the factor very 
important; the other, either of ut- 
most importance, or important. 

Weight of 1: One party ranking the factor as us- 
able but not important; the other, 
either important, or insignificant. 


The assigned weights for the differences between 
the responses were summed to arrive ata pro- 
fessional distance score for teacher factors. 
These scores are shown in Table V. 
Professional distance for teacher factors is 
Shorter for A teachers than for either B or C 
teachers in 13 of the schools studied. It is long- 


. er for C teachers than for either A or Bteach- 


ers in8 schools. In 16 schools, professional 
distance is shorter for A teachers than it is for 
C teachers. 

A second type of analysis was made of the 
assigned weights. First, the assigned weights 
were marked plus (4) if the teacher ranked the 
factor as more important than did her rater. 
All other weights were marked minus (-). Sec- 
ond, the plus and minus weights were summed 
algebraically to arrive at a Compensated Score 
of Professional Distance for Teacher Factors. 
The assumption for such an analysis was that a 
teacher would not be rated lower if she thought 
a teacher factor less important than her rater 
did, provided that she thought some other fa c - 
tor more important than did her rater. Since 
plus and minus values were summed algebraic- 
ally, Compensated Scores could be zero (0),a 
Positive quantity, or a negative quantity. In in- 
terpreting such scores zero would suggest the 
absence of professional distance. Direction of 
Professional distance would be indicated by the 
Sign: positive scores suggest that the teac her 
classifies factors as more important than her 
rater; negative scores suggest that she classi- 
fies them as less important than her rater. 
Length of professional distance is indicated by 
the integer, In comparing two scores for length 
of professional distance and one is negative, on- 
ly the integers are compared. Thus, in school 
number 34 the A teacher's score of plus ὃ is con- 
Sidered to be a shorter professional distance 
than the C teacher's score of minus 7. Compen- 
Sated Scores are shown in Tabel VI. 

Professional distance, measured by such an 
analysis is shorter for A teachers than either B 


GROTKE 


11 


or C teachers in 12 of the schools studied. It 

is longer for C teachers than for either A or B 
teachers in 12 schools. Іп 18 schools the А 

teacher's professional distance is shorter than 

the C teacher's. 

A third analysis was made of the assigned 
weights. Plus and minus signs were addedsim- 
ilarly to the method applied in the second analy- 
sis, but then only the negative values were sum- 
med to arrive ata Less ThanScore. The as- 
sumptions were that no compensation factor op- 
erated in any teacher being considered good, 
average, or poor; that her thinking a factor 
more important than her rater's opinion of the 
same factor has no bearing on her rating as 
a teacher; and that only her thinking factors to 
be less important than her rater's opinion of 
them bears on her rating asa teacher. Less 
Than Scores are shown in Table ҮП. Ininter- 
preting these scores, zero suggests the absence 
of professional distance, andthe higher the score 
the greater the professional distance. 

Such an analysis indicates that pr ofessional 
distance is shorter for the A teacher than for 
either the В or C teacher іп 9of the 30 schools 
studied. It is longer for C teachers than for A 
or B teachers in 11 schools. In 18 schools pro- 
fessional distance for A teachers is Shorter than 
for C teachers. 

Summary of Analyses for Professional Dis- 
tance. — Professional Distance scores were 
computed according to à variety of described 
procedures for each of the three instruments. 
For each procedure (1) the number of schools in 
which the A teacher's professional distance 
score was lower than either the B or C teacher, 
(2) the number of schools in which the C teach- 
er’s professional distance score was higher than 
either the A or B teachers' score, and (3) the 
number of schools in which the A teacher's 
score was lower than the C teacher's score 
was found. These are summarized in Table 
VIII. 

Conclusion. — The data do not completely 
support the hypothesis stated earlier. The short- 
est professional distance is not always between 
the rater and the teacher he rates good, nor is 
it always longest between the rater andthe teach- 
er he rates poor. Depending upon the method of 
analysis and the instrument, the number of 
schools in which the A teacher’s score is lowest 
varies from 9 to 16. Similarly, the number of 
schools in which the C teacher’s score is the 
highest varies from 8 to 17. The number of 
schools in which the A teacher’s score is less 
than the C teacher’s score varies between 16and 
20, depending on the method of analysis and the 
instrument. 

Apparently, the behaviors required by the 
teachers’ raters for persons performing the role 
of their good teachers function in a more com- 


12 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE V 


PROFESSIONAL DISTANCE SCORES FOR TEACHER FACTORS 


School 


Teacher Teacher Teacher 
Code No. Rated Good Rated Average Rated Poor 
1, 19 35 26 
2. 16 30 29 
3. 9 19 14 
4. 19 17 21 
5. 25 31 22 
6. 23 32 20 
7. 22 29 17 
8. 28 16 21 
9. А 28 19 19 
10. 25 26 26 
LT. 21 30 40 
12.. 18 19 18 
13 19 15 17 
14 18 19 17 
15. 14 17 13 
16. 8 23 22 
7; 12 21 16 
21. 16 18 17 
22. 15 18 11 
23. 14 18 17 
24, 18 25 11 
25, Ш: 13 18 
26 12 13 17 
m 16 17 20 
28 16 15 99 
30. 16 18 13 
91. 16 18 25 
32. 18 22 18 
33. 26 31 20 
34. 15 15 16 


(Vol. 24 


September, 1955) GROTKE 


TABLE VI 


COMPENSATED SCORES OF PROFESSIONAL DISTANCE FOR 
TEACHER FACTORS 


School Teacher Teacher Teacher 
Code No. Rated Good Rated Average Rated Poor 
1. 9 33 22 
2: 0 30 25 
8. -1 -9 d 
4. 11 1 21 
5. 25 31 18 
6. -11 -28 -18 
T; 18 29 -9 
8. 28 = 8 19 
9. 28 1 - 5 
10. 21 26 -4 
11. " 21 30 40 
12. -12 -11 -16 
13. 17 15 gis 
14. 16 3 1 
18. - 8 - 6 -1 
16. 4 -3 -16 
17. 6 -15 ra 
21. -8 -8 1 
22. 3 =p 9 
. 23. әд -10 -5 
24, -16 -21 1 
25. -1 3 10 
26. - 8 -5 aie 
27. 0 2 5 
28. 16 1 20 
30. 2 12 = 
31. -2 16 20 
32. 4 6 8 
33. 22 21 14 
. 5 5 Ë 7 
кесектен == 


13 


14 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE ΥΠ 


LESS THAN SCORES OF PROFESSIONAL DISTANCE FOR 
TEACHER FACTORS 


School Teacher Teacher Teacher 
Code No. Rated Good Rated Average Rated Poor 
SSE ae есе, 

I 5 1 2 

2. 8 0 2 

3. 5 14 9 

4. 4 8 0 

5. 0 0 2 

6. 16 30 19 

". 2 0 13 

8. 0 13 1 

9. 0 9 12 
10. 2 0 15 
Tl 0 0 0 
12. 15 15 17 
13. 1 0 9 
14, 1 8 3 
15. 11 12 7 
16. 2 13 19 
17. 3 18 8 
21. 12 13 8 
22. 6 12 1 
23. 9 14 11 
24. 17 23 6 
25. 6 5 4 
26. 10 9 13 
27. 8 7 8 
28. 0 2 1 
30. 7 3 10 
31. 9 1 0 
32. 7 8 5 
33. 2 2 3 
34, 5 5 13 


(vol. 24 


16 JOURNAL OF EXPERIMENTAL EDUCATION 


plex manner.’ Behaviors required by the raters 
may be classified into two or more classifica- 
tions. One classification may be considered 
“соге” behaviors, on which from the rater's 
point of view there is no controversy, and on 
which agreement is necessary for a teacher to 
be considered good by him. A second classifi- 
cation may be considered "peripheral" behay- 
iors, on which there exists an unresolved con- 
troversy and on which differences in points of 
view may be understood and accepted. The in- 
Struments used in this study are loaded with 
items from the second classification, since con- 
troversial items were sought for them. It may 
be that instruments designed to require the rat- 
er to (1) state his position, and (2) state whether 
he would accept alternate behaviors, would pos- 
sibly measure professional distance more pre- 
cisely. 

Another factor seems to be operating in the 
measurement of teachers, People do not dis- 
agree with one another with equal amounts of 
tact. It seems possible that a teacher who holds 
а point of view quite distant from that of her rat- 
er, may compensate for the possible conflict 
by being quite tactful about their differenc es. 
On the other hand, a teacher disagreeing on only 
a few issues may do so, so untactfully that a 
disproportionate weight is placed on her diver- 
gent points of view. 

A third factor may help explain the findings. 
Apparently, some raters of teachers are more 
tolerant than others. Raters of teachers who 
are tolerant of conflicting points of view may 
accept frequent and divergent opinions and not 
permit them to affect their ratings. Onthe other 
hand, raters of teachers who are intolerant of 
Conflicting points of view may base their ratings, 
in part, on teachers’ non-acceptance of their 
points of view. These are only three sugges- 
tions that may Clarify the findings. 


Part Two: Frequency of Disagreements 


Disagreements between raters and the teach- 
ers they rated were next analyzed without re - 
gard for the amounts of their divergencies, It 
was hypothesized that C teachers disagree with 
their raters more frequently than do B teachers, 
who in turn disagree with their raters more fre- 
quently than do A teachers, Many types of dis- 
agreements were found. 

Procedures for analyzing each instrume nt 
are reported Separately. The frequencies 
of each type of disagreement along with partial 
and complete totals of disagreements for each 
instrument are shown in Tables IX through XII. 
A summary of these tables, indicating the num- 
ber of schools in which (1) the A teachers dis- 
agree less frequently than do either the B or 


C teachers, (2) the C teachers disagree more 


(Vol. 24 


frequently than do either the A or B teache rs, 
and (3) the A teachers disagree less frequently 
than do the C teachers, appears in Table XIII. 

Teaching Practices. —Two types of differ - 
ences were recognized in analyzing the respon- 
Ses to this instrument. When one professional 
worker classified the practice as “good” and 
the other worker classified it as “роог”, the 
difference was considered an oppositional differ- 
ence. All other disagreements were non-oppo- 
Sitional differences. Frequencies for opposi- 
tional, non-oppositional, and total differences 
for Teacher Practices for A, B, and C teachers 
are shown in Table IX. 

Inventory of Beliefs. —Four types of differ- 
ences were recognized in analyzing the respon- 


ses to this instrument and are named and defined 
as: 


Type 1: Non-oppositional differences 

a) One professional worker responding “сал- 
not say, the other indicating any other re- 
sponse, ’’ 

b) One subject responding ‘‘Definitely believ- 
ing", the other responding ‘Inclined to 
believe”, 

c) One subject responding ‘‘Definitely not be- 


lieving", the other responding ‘Inc lined 
not to believe’’, 


Type 2: Mild Opposition 
One subject responding ‘‘Inclined to believe”, 
the other responding ‘Inclined not to believe’. 


Type 3: Moderate Opposition 


a) One subject responding ‘‘Definitely believ- 
ing’’, the other responding ‘‘Inclined not 
to believe”. 

b) One subject responding ‘‘Definitely not be- 


lieving", the other responding **Inclined 
to believe", 


Type 4: Strong Opposition 
One subject responding ‘‘Definitely believing" 


the other responding ‘Definitely not beliey- 
ing". 


Frequencies for е 
beliefs for A, B, 
Table X, 


ach type of disagreement for 
and C teachers are shown in 


ferences. Total di 
types of disagreements, 
Teachi 


September, 1955) GROTKE 


TABLE Ix 


FREQUENCIES OF TYPES OF DIFFERENCES ON TEACHER PRACTICES 


Frequencies of 


Oppositicnal Non-oppositional Total 
Differences Differences Differences 


B с А в с 


a 
© 
a 
Ф 
2 
e 
>| 
τύ 
Q 
> 


14 18 10 18 24 14 


t 4 6 4 
2. 2 5 4 24 7 14 26 12 19 
3. 3 4 4 15 22 22 18 26 26 
4, 5 8 1 18 20 22 23 28 29 
5. 12 14 8 12 8 8 24 22 16 
6. 4 5 5 17 16 17 21 21 22 
(а 3 13 2 24 16 36 27 29 38 
8. 6 8 9 18 19 18 24 27 27 
9. 1 5 17 8 16 10 9 21 27 
10. 15 9 19 17 12 13 32 21 32 
11. те 2 5 8 23 24 15 25 29 
12. 11 13 10 11 18 19 22 26 22 
13. 10 11 9 10 9 15 20 20 18 
14. 4 7 5 8 11 9 19 18 14 
15. 8 5 4 21 19 23 29 24 27 
16. 5 8 4 15 13 22 20 21 26 
17. 13 9 Ti 8 16 15 21 25 22 
21. 5 5 9 14 22 24 19 27 33 
22. 7 3 10 13 19 15 20 22 25 
23. 8 5 5 18 19 16 23 24 21 
24. 3 2 3 32 38 31 35 40 34 
25. 4 5 8 17 24 21 21 29 29 
26. 2 5 4 19 19 19 21 24 23 
27. 13 8 7 13 8 13 26 16 20 
28. 9 6 11 9 10 4 18 16 15 
30. 11 11 8 11 10 18 22 18 26 
31, 5 4 7 14 27 15 19 31 22 
32. 6 3 10 15 12 9 21 15 19 
33. 5 9 7 12 15 18 17 24 25 
34. 9 5 11 8 18 20 17 23 31 
ee M UE 


17 


т CÓ 
a 


(Vol. 24 


ϱ Gp LP е ІІ g Br тр со v 6 8 ъв 
τε Ф Ф ζ I I gI. τε ο ВП: ӨТ Tr “es 
τ. XI. УВІ I 0 I σε. IP -ο Je JT στ τε 
66 66 ус ζ Р I τ BOR we _ ТҰЛА: II зи 
το ap 26 A 7 с E S E, 9 "06 
ον Of СЕ 0 9 6 Lo. Re 776 9r.'0r 3 "82 
Ly GL 06 Gl ғ L GT XI 6 o 2 8 "Де 
τα; τ 166 8 ζ 0 ӨТ ο 61 G GC s AT. ποῦ 
бт” 44 %7 0 ба Т SL. "ери 3061 τν, GL τε ας 
РЕ 60 6I I I c a Gite. OT Gta ο "Op. τε 
να 6Р ТС 8 0 2 9T IZ ΟἹ DN 9 “52 
= 00: 590 JP Р Р 0 01. 1% ο СТАВЕ? ze σε 
о СР OF LP v v L Τσ. 8I 102 СЕРТ 6 τσ 
> де Ὁ o c" 1 v 06 TÉ Ρο ЄТ то 
O τα 66 46 8 с I Ра ТЕ τε cep g ‘OT 
Э Әт € тес Т DES GB Oz? “02 ο Glace: тръст 
5 [Sie ον (LQ 8 IL ο IP OE 76 pu с РІ 
η 9p OF τε 8 ζ I OY SE $i πα πρι EE 
5 8 9S LE То 7-08... Z Ер "XBE ~ ο] ε- 6- dO CI 
Z, 99 19 Ф с 6 ΩΤ 66 Ὃς “91 [ὦ ie =~ i "ТІ 
Е 02 9: 92 o “ο τὰ ЕР” ие her ce og" КОТ 
= 80 10 00 0 пи „ОТ ПОО, ТІ ШЕ ІС, μι 16 
m [δι 16) % 5 I I I 8 σι 98 ΤΙ... ὍΤ᾽ - 6" 78 
В OF ος . 5 4 220 с Or ο 07 ТІ” τε 9Т 51 
f ТЕ. zp- 09 6 en с ος -PI ας пре ο- Ὁ 9 
[ει 407 09 196 Т г те 0b 76 U- 92:6 “6 
° ра $0 ΙΡ 8 I 61 00 £2 “SI It ДУУ Т Ῥ 
> 66 64 08 αντ ПР 6 ПІ рі” m p php ο 
2 Jp Ie 606 9 6 G ble ср ο A Г. 24 
= 96 86 6 E LECT. 9r. ^er ЭСТ eee σος, κ СТУ Я 
5 ӘЖЕ Ex 5 ας v аст о я у ‘ON әроо 
т 9441, 2 ədÁ L с θ441, p 9441], Ioouos 


сдопелета JO зетопепреля 


ЕЕС 


сяяттяя НОЯ зчомяняяята AO S3dAL AO SHIONSAOOSHA 


X ΜΠΊΤαΥΊΙ, 


18 


September, 1955) 


== сс еч: > == 


4 Totals of Totals of Total of 
School Types 3 and 4 Types 2,3 and 4 All Types 
Code No. A B C A B C A B C 

1. 26 37 43 30 37 46 79 75 82 
2. 26 20 20 31 29 26 81 76 73 
3. 25 14 29 27 20 37 77 79 76 
4. 13 31 31 32 32 39 73 86 93 
5. 18 24 32 24 36 33 80 91 90 

6. 30 20 33 35 33 42 95 75 83 
‘ie 36 34 21 39 34 28 74 60 68 
8. 41 22 19 42 23 20 73 72 77 
9. 23 30 33 33 28 33 89 95 91 
10. 40 ο 42 50 45 42 50 71 58 70 
11 20 21 37 35 36 40 89 97 106 
12. 31 28 27 33 34 28 70 90 76 
13. 30 24 23 31 26 31 65 66 77 
14, 9 10 15 19 27 23 76 74 84 
15. 28 43 39 32 45 40 65 82 66 
16. 29 28 33 30 33 41 88 88 92 
1:72 37 38 40 38 40 40 80 67 47 
21. 27 32 34 34 36 38 81 76 81 
22. 42 28 38 42 32 42 66 85 75 
23. 24 40 23 26 40 31 ΤΊ 89 85 
24. 59 36 42 55 37 43 74 96 77 
25. 42 28 45 43 33 45 65 88 61 
26. 32 43 23 32 45 31 65 66 82 
27. 17 21 19 24 25 34 74 100 81 
28. 13 34 41 18 40 41 61 78 84 
30. 22 39 16 24 41 28 81 90 79 
31. 35 32 24 36 36 26 90 91 85 
32. 37 28 49 38 28 50 57 45 72 
33. 30 36 37 31 37 39 75 19 70 
34. 31 22 23 34 33 26 81 78 75 

he aah a ee es sp 


GROTKE 


TABLE ΧΙ 


PARTIAL AND COMPLETE TOTALS OF TYPES OF DIFFERENCES 
FOR BELIEFS 


19 


and their raters were classified into two groups. 
The first group contained those differencesin 
which the teacher thought the factor more im- 
portant than did her rater; the second group con- 
tained those differences in which the teacher 
thought the factor less important than did her 
rater. The frequencies for both groups of disa- 
greements together with the total number of 
“disagreements for A, B, and C teachers are 
shown in Table ΧΠ. 

Summary of Frequencies of Disagreements.- 
The hypothesis studied in Part II of this discus- 
sion was that C teachers disagree with their rat- 
ers more frequently than do either A or B teach- 
ers, and that A teachers disagree with their 
raters less frequently than do either B or С 
teachers. Thirteen classifications of types of 
disagreements were recognized and studied. 
Tables IX through XII were analyzed to deter- 
mine the number of schools in which (1) A teach- 
ers disagree with their raters less frequent- 
ly than do either B or C teachers, (2) C teach- 
ers disagree with their raters more frequently 
than do either A or B teachers, and (3) A teach- 
ers disagree with their raters less fre quently 
than do C teachers. These numbers of schools 
are shown in Table XIII. 

Conclusions. — The data do not completely 
Support the hypothesis stated earlier in Part 
Two of this analysis. 

Depending upon the instrument and the clas- 
Sification of type of disagreements analyzed, the 
number of schools in which A teachers disagree 
less frequently than do either B or C teachers 
varies from 9 to 16. For 12 of the 13 classifi- 
cations of types of disagreements the number of 
Schools in which A teachers disagree less Íre- 
quently was slightly less than 50 percent of the 
30 schools studied - It appears thatthe num- 
ber of times A teachers disagree with their rat- 

n do either B or C teach- 
n the number of times 
uently than either B or C 


Depending upon the classifica 
disagreements Studied, the num 
in which C teacher: 


tion of types of 
ber of schools 

S disagree more fre quently 
than do either A or B teachers varies from 8 to 
14. This range would Suggest that the number 
of schools is always slightly less than 50 per- 
cent of the schools Studied, It appears, there- 
fore, that the number of times C teachers 
disagree with their raters more Írequently than 
do A or B teachers is Slightly less thanthe num- 
ber of times they disagree less frequently than 
do A or B teachers. 

It is therefore coneluded that the frequency 
of disagreement Slightly increases as ratings in- 
crease from poor to.average and from average 
to good. 


Depending upon the classification of types of 


JOURNAL OF EXPERIMENTAL EDUCATION 


(Vol. 24 


disagreements, the number of schools in which 
A teachers disagree with their raters less fre- 
quently than do C teachers varies from 13 to 19 
(average 16). Suchan average is slightly more 
than the 50 percent of the 30 schools studied. It 
is therefore concluded that the frequency of dis- 
agreement decreases slightly as ratings in- 
crease from poor to good. Sucha conclusion 
does not entirely contradict the conclusion 
stated above which compared frequency of disa- 
greement to ratings increasing from poor to 
average and from average to good. Itis suggest- 
ed that teachers rated average may not necessar- 
ily be between good and poor, a finding suggest- 
ed by Lamke (19). 

The interpretations of the findings of Part 
One, ProfessionalDistance, seem equally appli- 
cable here. It was suggested that required behav- 
iors for performing the role of the goodteacher 
тау ехіѕіаѕ “core” behaviors, over which there 
may be no controversy and little or no disagree- 
ment, and “регірһега1” behaviors, over which 
controversy and disagreement are acceptable. 

If this is so, then frequencies of disagreements 
over both core and peripheral required behav- 
lors may have less meaning for comparisons 

with teacher ratings. 

Secondly, the suggestion, made in the conclu- 
sions to Part One, that the manner of disagree- 
ment may be a potent factor along with the num- 
ber of disagreemenis seems to apply to the find- 
ings on frequencies of disagreements. Theseare 
only two suggestions that may clarify the find- 
ings on frequency of disagreements. 


Part Three: Item Analysis 


The differences between the points of view of 
the teachers and their raters were analyzed by 
items. It was hypothesized that if A teachers 
disagreed less frequently and less divergently 
from their raters than did either B or C teach- 
ers on certain items, those items may be con- 
Sidered critical in that agreement with one’s rat- 
er may be associated with a teacher being con- 
Sidered good by her rater. For convenience, 
Such items were called ««Α teacher items", Sim- 
ilarly, it was hypothesized that if C teachers 
disagreed more frequently and more divergently 
from their raters than did either A or B teach- 
ers on certain items, those items may be con- 
Sidered critical in that disagreement with one's 
rater may be associated with a teacher being 
considered poor by her rater, Such items were 
called **C teacher items", 

Teacher Practices. — This instrument con- 
Sisted of 53 teacher practices which Subjects 


classified as “вооа”, “poor”, or “makes no 
difference”, 


September, 1955) GROTKE 


TABLE XII 


FREQUENCIES OF TYPES OF DIFFERENCES FOR TEACHER FACTORS 


Frequencies of Differences 


Teachers thinking Teachers thinking Teachers thinking 


item more impor- item less impor- different from 
tant tant rater 
School 
Code No. A B с А B σ Α Β (е: 
i. 11 17 13 2 H 2 13 18 19 
9. 8 18 18 5 0 2 18 18 20 
3. 4 5 5 4 10 9 8 15 14 
4. 13 8 17 4 6 0 LT 14 17 
5. 18 18 13 0 0 2 18 18 15 
6. 5 2 1 12 13 17 19 14 
ths 16 22 4 1 11 17 22 15 
8. 1T. T 14 0 1 17 17 15 
9 18 q 5 0 10 18 13 15 
10 21 22 9 2 13 23 22 22 
11. 14 21 24 24 0 14 21 24 
12. Б 4 1 12 12 15 15 13 
13. 12 13 6 1 6 13 13 12 
14. 13 9 10 0 3 13 16 13 
15. 3 5 6 9 6 12 14 12 
16. 5 9 3 2 13 7 21 15 
17. 8 3 6 3 6 11 18 12 
21. 3 5 8 10 7 13 16 15 
29. 8 6 9 6 1 14 17 10 
23. 4 3 4 8 9 12 14 13 
24. 1 2 5 14 5 15 17 10 
25. 5 7 11 6 4 11 19 15 
26. 2 4 4 9 10 11 13 14 
21. 7 8 9 7 7 14 14 16 
28. 13 12 15 0 T 13 14 16 
30. 7 13 3 6 9 13 15 12 
91. 6 14 18 8 0 14 15 18 
32 18 18 11 2 3 20 19 14 
33 10 12 12 7 5 17 19 17 
5 10 15 13 16 


34. 10 9 6 


22 


JOURNAL OF EXPERIMENTAL EDUCATION 


SUMMARY OF THE FREQUENCY TABLES 


Instruments and Types 


of Differences 


Teacher Practices 


Oppositional 
Differences 


Non-oppositional 
Differences 


Total Differences 


Inventory of Beliefs 


Type 4 (Strong 
Opposition) 


Type 3 (Moderate 
Opposition) 


Type 2 (Mild 
Opposition) 


Type 1 (Non- 
Opposition) 


Sub-Totals 


Types 4 and 3 


Types 4,3,2 
(А11 Opposition) 


Types 4 through 1 
(All Differences) 


Teacher Factors 


First Group 
Second Group 


Total Differences 


10 


13 


13 


10 


13 


13 


C disagrees more 
frequently than 
AorB 


10 


10 


11 


11 


11 


14 


14 


10 


Number of Schools of the 30 Studied in which 


A disagrees less 
frequently than 
BorC 


A disagrees less 
frequently 
than C 


17 


18 


18 


13 


16 


13 


15 


17 


19 


18 


(Vol. 24 


September, 1955) 


were used in the item analysis. The first type 
were called oppositional disagreements and 
were defined as those in which one subject clas- 
Sified the practice as “good”, while the other 
classified itas “poor”. The second type of dis- 
agreements were called total disagreements and 
were defined as those in which the two profes- 
Sional workers responded differently from one 
another. For each item the number of opposi- 
tional disagreements and total disagreements 
between the A teachers and their raters were 
Íound. Similarly, for each item the number of 
Oppositional disagreements and total disagree- 
ments for B and C teachers were found. These 
data are shown in Table XIV. 

In analyzing Table XIV A teacher items were 
defined as those on which (1) the number of dis- 
agreeing A teachers is zero, while the number 
of disagreeing B or C teachers is one or more, 
or (2) the number of disagreeing A teachers is 
50 percent or less of the number of disagreeing 
B teachers or C teachers, whichever is less. 
Items numbered 2, 3, 23, 24a, 24c, and41 are 
A teacher items. 

c teacher items were defined as those on 
Which (1) the number of disagreeing C teachers 
1S one or more, while the number of disagree- 
ing A or B teachers is zero, or (2) the number 
of disagreeing C teachers is 200 percent or 
more of the number of disagreeing Aor B teach- 
ers, whichever is larger. Itéms numbered 2, 
8, 5, 6d, 19, and 22 are C teacher items. 

Conclusions to Item Analysis of Teacher 
Practices. — The data supports the hypothesis 
that there are certain items on which A teach- 
ers disagree with their raters less frequently 
than do either B or C teachers. Six such items 
Were found. They are: 


2. Stands at the side of the room. 
3. Stands at the rear of the room. 

23. Organizes subject matter into psycho- 
logically arranged form (from pupils’ 
experiences to logical generalizations). 

24a. Assignments: page to page in textbook. 

24c. Assignments: general topics and nothing 
more. 

41. Measures results of learning by changed 
pupils’ attitudes and behaviors. 


Further, the data supports the hypothesis that 
On certain items C teachers disagree with their 
raters more frequently than do either А ог B 
teachers. Six such items were found. Theyare: 


- Stands at the side of the room. 
Sits at desk. 

Sits on pupil's desk at front of room. 

d. Sits in pupil's seat at rear of room. 
Provides for individual differences by 
differentiating assignments (the contract 


со о Og № 


GROTKE 23 


plan, unit instruction, level assignments, 
etc. ). 

22. Organizes subject matter in problem-pro- 
ject form. 


One item, number 2, appears to be both an 
A and C teacher item. Apparently, agreement 
with one's rater on the evaluation of the prac- 
tice of standing at the side of the room may be 
associated with a teacher being considered a 
good teacher, while disagreement with one's rat- 
er on the evaluation of this practice may be as- 
sociated with a teacher being considered a poor 
teacher. 

Individual items and their implications are 
discussed in more detail in the summary and con- 
clusions to Part Three. 

The Inventory of Beliefs. — This instrument 
consisted of 120 statements of beliefs related to 
education. Subjects were asked to select one of 
the following responses as their answer: (1) Yes, 
I definitely believe this statement; (2) I am in- 
clined to believe this statement; (3) Icannotsay; 
(4) Iam inclined not to believe this statement; 
or (5) No, I definitely do not believe this state- 
ment. Responses of the teachers were com- 
pared with those of their raters, and weights 
were assigned to the differences as in Part 
One of this analysis. 

Two approaches were followed in analyzing 
this instrument by items. The first approach 
considered frequencies of disagreements without 
regard to the degree of divergencies. The sec- 
ond approach considered both frequencies and 
divergencies. Four analyses were made of the 
data prepared for each of the two approaches. 
Critical items are quoted in the summary and 
conclusions of the item analysis of this instru- 
ment. 

Analysis of Frequencies of Disagreements. — 
Two classifications of disagreements were used 
in the analysis. The first classification, called 
oppositional disagreements, was definedas those 
assigned weights of 4 or 3 plus those assigned 
a weight of two when one subject responded, “1 
am inclined to believe this statement, ” and the 
other responded, “Таш inclined not to believe 
this statement. " The second classification, 
called total disagreement, was definedas those 
on which the teacher responded dif- 
ferently from her rater. For еасһ Шет the num- 
ber of A teachers who opposed their raters and 
who responded differently from them were found. 
Similarly, the number of B and C teachers who 
opposed and responded differently from their 
raters were found for each item. These fre- 
quencies of A, B, and C teachers are shown in 
Table XV. 

In analyzing the oppositional frequencies, A 
teacher items were defined as any item onwhich 
the number of A teachers who oppose their rat- 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XIV 


ITEM ANALYSIS OF TEACHING PRACTICES 


Item Oppositional Non-oppositional Total 
Number* A B с А B с А B с 
1. 3 2 2 14 16 13 17 18 15 
2. 0 1 4 16 15 11 16 16 15 
3. 1 2 2 17 15 14 18 17 16 
4, 0 0 2 15 23 20 15 23 22 
5. 0 0 1 17 14 13 17 14 14 
6a. 0 1 0 16 15 21 16 16 21 
6b. 2 3 1 15 18 23 17 21 24 
6c. 3 2 2 12 16 19 15 18 21 
6d. 2 0 4 14 16 17 16 16 21 
T 2 3 4 16 15 14 18 18 18 
8. 0 0 0 13 12 14 13 12 14 
9. 2 0 1 10 5 10 12 5 11 
10. 3 5 3 17 19 16 20 24 19 
11. 3 3 2 16 13 15 19 16 17 
12. 4 5 4 18 15 18 22 20 22 
19. 6 5 3 12 13 15 18 18 18 
14 5 5 8 11 16 16 16 21 24 
15. 7 5 8 11 10 13 18 15 21 
16а. 9 5 8 6 10 9 15 15 17 
160, 8 8 9 1 2 4 9 5 7 
16ο. 9 10 19 8 4 4 19 14 17 
16d. 5 6 5 6 4 5 11 10 10 
16е. 4 2 7 * 8 5 11 10 12 
її; 2 1 2 1 1 1 3 2 3 
18. τ T 7 6 6 3 13 13 10 
19. 0 0 1 1 1 2 1 1 3 
20. 5 5 5 5 7 3 10 12 8 
21, 13 7 13 11 13 8 24 23 21 
22. 1 0 4 8 7 8 9 7 12 
23. 0 1 1 6 5 9 6 6 10 
24a 2 3 5 2 5 4 4 8 9 
24b. 6 6 6 3 9 T 9 15 13 
24c, 1 3 3 4 5 3 5 8 6 
24d, 9 5 6 4 11 5 13 16 11 
24е. 0 1 0 3 3 5 3 4 5 
25. 6 10 11 9 8 T 15 18 18 
26. 3 5 5 4 7 4 7 12 9 
27. 3 5 4 12 15 13 15 20 17 
28. 3 5 5 5 6 Т 8 1 12 
29. 0 0 1 2 3 3 2 3 4 
30. 0 0 0 6 6 6 6 6 6 
31. 4 5 4 7 7 7 11 12 11 
32. 7 8 w 10 12 11 17 20 18 
33. T 7 T 10 10 11 17 1 18 
34. 5 4 5 8 11 11 13 15 16 
35. 6 6 4 7 12 14 13 18 18 
36, 3 2 3 6 6 6 9 8 9 
37, 4 5 3 2 3 3 6 .8 6 
38. 3 6 4 3 3 5 6 9 9 
39. 8 13 10 3 10 5 11 23 15 
40. 6 3 6 4 12 8 10 15 14 
41, 2 3 1 1 5 5 3 8 6 
42. 6 8 3 6 8 8 12 16 11 
ing the patternof the sous ed i rae use letter Suffixes, follow 


(Vol. 24 


September, 


1955) 


GROTKE 


TABLE XV 


ITEM ANALYSIS OF BELIEFS RELATED TO EDUCATION: FRE- 
QUENCIES OF OPPOSITIONAL AND TOTAL DIFFERENCES 


Oppositional Total 
Item Differences Differences 
Number A B с А B с 
1. 12 8 9 17 19 19 
9. 9 12 11 20 21 23 
3. 10 11 14 14 19 17 
4, 11 19 10 29 22 20 
5. 10 6 9 20 17 20 
6. 7 7 5 20 15 14 
7. 1 6 4 8 11 10 
8. 9 12 11 15 21 17 
9. 10 12 9 21 19 15 
10. 0 0 0 9 18 9 
11. 5 7 8 15 18 15 
12; 16 8 8 20 20 16 
13. 4 4 5 15 17 13 
14, 4 4 5 9 15 9 
15. 13 13 11 22 25 23 
16. 13 14 15 25 28 24 
її. 12 14 16 21 20 18 
18. 1 4 2 10 12 10 
19. 4 4 8 20 18 19 
20. 9 9 9 20 22 23 
ο 
91. 15 10 14 24 19 22 
22. 8 10 9 22 25 21 
23. 10 10 11 23 20 22 
24, 12 11 19 21 22 27 
25. 8 4 5 17 17 15 
26. 9 12 17 22 21 23 
27. 9 7 4 15 14 13 
28. 3 3 3 8 16 9 
29. 8 T 6 17 19 15 
30. 12 10 11 18 23 21 
31. 14 10 11 22 21 19 
32 12 11 11 23 23 21 
33 8 10 13 22 24 23 
34. 13 10 8 18 20 18 
35 8 3 7 14 11 17 
36 16 13 13 25 24 22 
37. 16 13 9 25 25 25 
38. 2 4 0 14 15 20 
39. 5 5 5 14 16 16 
40. 5 10 8 17 22 18 
41. 14 7 6 21 25 23 
42. 2 4 1 20 18 18 
43. 8 11 15 17 23 25 
44. 4 6 9 12 12 15 
45. 6 8 10 20 24 23 
46. 1 5 8 17 21 23 
4T. 7 4 7 13 12 19 
48. 7 5 7 23 22 20 
49. 3 4 4 19 22 19 
7 T 8 16 18 19 


25 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE XV (Continued) 


Oppositional Total 
Item Differences Differences 
Numer A B C A B C 
Se e aL “Πε 

51. 8 11 9 22 25 23 
52. 12 9 15 20 23 21 
53. 3 6 1 15 25 18 
54, 12 6 6 19 17 15 
55. 9 9 9 21 23 24 
56. 10 14 9 20 27 20 
57. 10 7 13 20 20 23 
58. 10 13 10 18 23 25 
59. 11 9 14 25 26 25 
60. 6 7 9 22 18 19 
61 9 7 10 21 23 22 
62 15 8 15 27 22 25 
63 8 10 8 20 24 23 
64 13 11 15 23 23 26 
65 9 12 12 25 24 22 
66 15 14 13 26 26 27 
67 5 5 5 17 17 18 
68 5 17 13 17 25 22 
69 12 13 15 24 25 24 
70 11 9 13 24 23 23 
fe 14 14 14 22 24 24 
72. 14 11 9 19 16 16 
73. 9 7 7 25 21; 26 
74. 6 7 6 20 20 20 
75. 6 10 11 18 24 24 
76. 8 8 7 18 19 17 
77. 5 5 5 14 20 16 
78. 5 6 5 15 21 19 
79. 13 6 14 21 21 22 
80. 11 11 12 24 25 21 
81. 3 2 8 8 13 12 
82. 10 17 7 23 24 25 
83. 2 4 1 12 12 11 
ры 1 4 2 16 20 13 
85. 4 9 4 19 22 21 
86. 12 11 9 24 23 19 
er. 5 9 79 15 19 22 
88. 11 8 8 20 15 14 
89. 12 15 11 24 22 19 
90. 13 9 15 20 23 23 
91. 5 4 11 20 
32. 9 7 10 25 18 22 
93. 7 11 8 22 94 23 
94, 9 11 11 26 22 23 
95. 10 6 7 21 18 15 
29. T 6 5 ΠΠ ЖЫҚ. 
97. 9 19 15 17 21 23 
98. 1 2 1 11 11 13 
29. 5 6 7 21 οὐ. 24 
100. 8 9 13 22 22 24 
101. 11 14 14 21 23 27 
102 11 12 


————— =“ 


September, i955) 


GROTKE 


TABLE XV (Continued) 


Oppositional Total 
Item Differences Differences 
Number A B C A B C 
103. 13 13 16 23 24 22 
104. 13 15 Lf 23 22 22 
105. 10 8 10 23 23 29 
106. 13 14 13 24 23 22 
107. 6 6 5 14 16 13 
108. 16 12 11 24 20 20 
109. 3 0 2 12 11 10 
110. 12 12 15 24 24 24 
111. 11 8 15 26 25 26 
112. 5 T 11 15 19 19 
Lis. > 4 5 14 16 14 
114. 4 2 3 12 13 12 
115. 2 3 3 12 9 15 
116. 7 6 9 21 20 21 
1142 10 11 7 20 26 20 
118. 2 5 3 8 13 12 
119 11 14 11 24 22 26 
120 1 0 3 ІН 12 13 


27 


28 JOURNAL OF EXPERIMENTAL EDUCATION 


ers was 67 or less percent of the number of B 

or number of C teachers, whichever is less, 

who oppose their rater. Seven items, those 

numbered 7, 18, 40, 44, 68, 75, and 84, fitthe 
definition. 

In analyzing the total differences frequencies, 
A teacher items were defined as any item on 
which the number of A teachers who disagree 
with their raters was 67 percent or less of the 
number of B or number of C teachers, which- 
ever is less, who disagree with their raters. It- 
tems numbered 81, 105, 108, and 118 fit the 
definition. 

In analyzing the oppositional frequencies for 
C teacher items, they were defined as any it- 
em on which the number of C teachers who op- 
pose their rater is 150 or more percent of the 
nuniber of A or number of B teachers, which- 
ever is more, who oppose their raters. Four 
items, numbered 19, 24, 91, and 102, fit the 
definition. 

In analyzing the total differences frequencies 
for C teacher items, they are defined as those 
on which the number of C teachers who disagree 
with their raters is 150 or more percent of the 
number of A or number of B teachers, which- 
ever is more, who disagree with their rater Д 
There are no such items. 

Analysis of divergencies of disa; reements.— 
Two analyses of the weights assigned to the dis- 
agreements were made. The first cons idered 
the weights assigned to Oppositional disagree- 
ments, The second considered all ass igned 
weights, 

In the first analysis an oppositional weight 
for each item was Computed for the A teachers 
by summing the weights assigned to the opposi- 
tional disagreements between these teachers 
and their raters. For the Second analysis total 
weights for each item were computed for the A 
teachers by Summing the weights assignedtoall 
disagreements between these teachers and their 
raters, Similarly, oppositional weights апа tot- 
al weights were computed for each item for the 
B and C teachers, These data are shown in 
Table XVI. 

In analyzing the oppositional w eights, A 
teacher items were definedas those on which 
the oppositional weight of the A teachers was.67 
or less percent of the oppositional weight of the 
B or C teachers, whichever is less. Nine items, 
numbered 7, 18, 40, 44, 68, 75, 84, 103, and 
108, fit the definition, Only one item, number 
103, was not detected using the analysis for 
frequencies, 

In analyzing the total weights for A teacher 
items, they were defined as any item on which 
the total weight of the A teachers was 67 or less 
percent of the total weight of the B or C teach- 
ers, whichever is less. Items number 7 and 
68, both found previously, were found to be 


critical. 

Іп analyzing the oppositional weights for C 
teacher items, they were defined as those on 
which the oppositional weight for the C teacher 
was 150 or more percent of the oppositional 
weight of the A or B teacher, whichever is more. 
Items numbered 19, 24, 91, 112, and 120 were 
Íound to be critical. Items 112 and 120 had not 
been detected using other analyses. 

In analyzing the total weights for C teacher 
items, they were defined Similarly to the pat- 
ternabove. No new items were found. 

Conclusions to the Item Analysis of the Inven- 
tory of Beliefs. — The data seems to support the 
hypothesis that there are certain items on which 
A teachers disagree with their raters less fre- 
quently than do either B or C teachers. Twelve 
such items were found. They are: 


7. Ibelieve that pupils should be permitted 
to call teachers by their nicknames or 
given names. 

18. I believe that teaching offers a wide var- 
iety of interesting experiences, 

40. I believe that today’s schooling makes 
too many students consider unskilledand 
semiskilled positions as not good enough 
for them. 

44. I believe that teachers should teach 
Students to side with the majority oncon- 
troversial is5ues. 

68. Ibelieve that in hiring an individual for 
a job, it is often advisable to include 
race, color, and religion, in making your 
selection. 

75. I believe that it is reasonable to fire a 
teacher who admits he is a Socialist. 

81. I believe that private profits are essen- 
tial to any successful economic system. 

84. I believe that free enterprise has proved 
its superiority over other types of econ- 
omic enterprises for America. 

103. I believe that churches cause needless 
Strife by over-emphasizing the differ- 
ences among groups. 

105. I believe that churches should take bet- 
ter care of their own parishioners rather 
than spend money on missions in foreign 
countries. 

108. Ibelieve that public schools should pro- 
vide released time from classes for re- 

ligious instruction, 

I believe that working with people is bet- 

ter than working with things. 


118. 


Further, the data seem to Support the hy- 
pothesis that on certain items, C teachers dis- 
agree with their raters more frequently than do 
either A or B teachers. Six such items were 
found. They are: 


(Vol. 24 


September, 1955) 


GROTKE 


TABLE XVI 


ITEM ANALYSIS OF BELIEFS RELATED TO EDUCATION: 
WEIGHTED SCORES OF OPPOSITIONAL AND TOTAL 


DIFFERENCES 
Oppositional Total 
Item Weights Weights 
Number A B с А B σ 
1, 38 27 12 44 41 42 
В. 26 39 38 39 48 50 
3; 37 34 45 43 44 48 
4. 41 41 30 52 50 41 
δ. 36 21 32 47 32 44 
6. 24 26 17 42 38 31 
N. 4 20 15 12 25 21 
8. 29 38 37 35 47 43 
9. 36 37 29 50 44 37 
10. 0 0 0 9 13 9 
11 18 26 24 29 38 32 
12 35 29 26 50 47 38 
13 14 11 18 28 28 26 
14 13 12 17 18 24 22 
15 44 43 36 54 58 49 
16 88 43 46 53 61 59 
17 40 44 49 50 41 51 
18 4 13 7 14 21 15 
19 13 13 24 35 31 39 
20 31 30 31 46 44 49 
° 
21. 53 30 47 65 42 58 
22. 28 32 31 46 49 45 
23. 35 30 38 49 41 51 
24, 35 33 61 47 49 13 
25. 26 13 15 37 28 28 
26. 30 41 53 46 51 62 
27. 27 24 12 33 31 22 
28. 11 11 7 17 25 13 
29. 23 23 20 32 36 30 
30. 32 35 31 39 49 42 
31. 48 32 37 57 48 47 
32. 36 35 34 48 47 46 
33. 26 32 4 45 49 58 
34. 42 30 27 49 42 39 
35. 27 8 25 34 17 36 
36. 55 40 43 66 52 53 
37. 4 5 38 + и. 
a 6 13 0 20 28 26 
nol 16 M 17 27 31 29 
40. 18 33 27 36 49 41 
41. 46 24 19 56 50 43 
ax 6 12 3 25 28 24 
43. 27 38 49 37 50 62 
44 13 20 32 21 28 40 
A 23 27 31 41 48 47 
46 24 17 24 35 37 41 
AT. 25 12 28 33 31 42 
48 23 16 28 46 40 46 
49 8 12 11 з1 3T 33 
22 25 30 33 39 47 


29 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE XVI (Continued) 


Oppositional Total 
Item Weights Weights 
Number A B e A B С 
pe ee A = 0 TR 45 & 
51. 24 36 30 41 52 48 
52. 34 28 46 43 46 54 
53. 9 22 2 26 44 26 
54. 34 18 21 42 34 33 
55. 25 28 27 41 45 44 
56. 33 46 31 47 61 46 
57. 32 21 43 46 39 56 
58 31 40 33 41 56 54 
59 35 27 45 53 47 60 ‚ 
60 20 19 28 40 35 41 
61 31 23 35 41 43 54 
62 48 28 53 66 48 68 
63 27 30 28 45 50 47 
64 43 36 50 54 52 64 
65 28 38 37 49 53 49 
66 48 43 36 62 57 53 
67 18 19 17 36 33 35 
68 ІЛ 61 39 30 70 49 
69 37 37 43 32 50 53 
70 33 25 38 44 41 49 
71 46 44 45 54 54 56 
12 53 42 35 60 49 44 
73 26 24 26 48 44 57 
74 16 22 16 38 40 35 
15 21 34 34 36 52 55 
76 30 29 20 43 47 35 
TT 16 16 15 31 35 30 
78 17 18 16 30 38 37 
79 43 20 48 52 39 59 
80 35 38 37 55 ББ 49 
81 11 6 11 17 20 22 
82 30 52 22 44 64 44 
83 6 16 4 17 25 16 
n 4 16 6 25 38 22 
85 12 29 13 32 49 35 
аб ба « ВТ, 2p G0 55 43 
Ы if +20 7 30 39 35 
88 38 28 25 49 35 32 
91 16 13 36 33 32 52 
93 19 34 26 38 48 43 
94 27 29 31 46 41 45 
95 30 20 22 46 37 31 
98 3 7 3 16 19 18 
99 17 20 22 36 42 49 
100 23 27 38 40 45 54 


— | 


September, 1955) 


GROTKE 


TABLE XVI (Continued) 


Oppositional Total 
Item Weights ' Weights 
Number A B C A B G 
101. 34 45 46 45 55 59 
102. 36 38 36 49 49 48 
103. 47 45 53 61 60 62 
104. 43 46 57 56 54 62 
105. 33 23 32 47 42 48 
106. 48 50 48 62 64 58 
107. 21 22 17 30 35 28 
108. 51 39 36 61 49 45 
109. 10 0 7 20 14 17 
110. 40 43 53 55 59 63 
111. 32 25 41 49 46 62 
112. 15 20 39 26 34 47 
113. 8 16 16 21 30 26 
114. 14 4 11 26 22 24 
115 7 9 10 19 1T 25 
116. 24 17 25 41 36 40 
117. 33 39 22 45 59 41 
118. 6 19 9 13 27 18 
119 38 43 37 55 55 57 
120 3 0 10 14 14 22 


31 


32 JOURNAL OF EXPERIMENTAL EDUCATION 


19. I believe that in teaching, promotions 
are based on who you know rather than 
on what you know. 

24. I believe that teachers should be free 
to use alcoholic beverages. 

91. І believe that trade unions have done 
more harm than good in our industrial 
progress, 

102. I believe that people who claim to be 
religious are less tolerant than people 
who do not claim to be religious. 

112. I believe that most people will take ad- 
vantage of you. 

120. I believe that a large amount of money 
is a prerequisite to success. 


A more detailed interpretation of the items 
and their implications is presented in the sum- 
mary and conclusions to Part Three. 

Teacher Factors. —This instrument consist- 
ed of 25 teacher factors which subjects classi- 
fied on a five-point scale from **utmost import- 
ance” to *'insignificant". The responses of the 
teachers were compared with those of their 
raters. When differences existed between the 
responses of the teacher and those of her rat- 
er, weights were assigned as in Part One of 
this section. Plus and minus designators were 
added similarly to the procedure used in the 
Compensated Score analysis. 

Two approaches were followed in analyzing 
the positive and negative weights. The first 
approach considered frequency of weights with- 
out regard to divergencies. The s econd ap- 
proach considered both Írequencies and diver- 
gencies. Four specific procedures were fol- 
lowed to analyze the data prepared for each of 
the two approaches. Critical items are quoted 
in the summary and conclusions of the item an- 
alysis for Teacher Factors, 


Procedures for Analyzing the Frequencies . 
of Disagreements, — Follo 


wing the first ap- 
proach, frequencies of positive weights and neg- 


ative weights were found for the A teachers for 
each item, Similarly, frequencies of positive 

and negative weights were found for the Band 
C teachers for each item. These data are shown 
in Table XVII. 

In analyzing the frequencies of positive 
weights in Table XVII for A teacher items, it 
was hypothesized that A teachers would place 
more importance on the teacher factors than 
would B or C teachers, Therefore, for the first 
specific procedure, A teacher items were de- 
fined as items on which the number of A teach- 
ers who classified the factor as more import- 
ant than did their raters is 200 or more per- 
cent of the number of B or number of C teach- 
ers, whichever is higher, who classified it as 
more important than did their raters. There 
are no such items. 


(Vol. 24 


For specific procedure number two, A teach- 
er items were defined as items on which the 
number of A teachers who classified the factor 
as less important than did their raters is 50 or 
less percent of the number of B or number of C 
teachers, whichever is less, who classified it 
as less important than did their raters. Items 
9, 12, and 16 fit the definition. 

In analyzing Table XVII for C teacher items, 
it was hypothesized that C teachers would place 
less importance on the teacher factors than 
would A or B teachers. Therefore, for specific 
procedure number three, C teacher items were 
defined as items on which the number of C teach- 
ers who considered the factor more important 
than did their raters is 50 or less percent of the 
number of A teachers or number of B teachers, 
whichever is less, who consider the factor as 
more important than did their raters, There 
are no such items, 

For specific procedure number four, C teach- 
er items were defined as items on which the num- 
ber of C teachers who classified the factor as 
less important than did their raters is 200 or 
more percent of the number of A or number of 
B teachers, whichever is more, who classified 
itas less important than did theirraters. There 
are no such items. 

Procedures for Analyzing Divergencies and 
Frequencies. — Following the second approach, 
а positive weighted score for A teachers was 
computed for each item by summing the positive 
weights assignedtothe differences between 
the responses of the A teachers and their raters. 
Likewise, a negative weighted score for A teach- 
ers was computed for each item by summing the 
negative weights assigned to the differences be- 
tween them. Similarly, sets of positive and περ” 
ative weighted scores for each item were сот- 
puted for the B and C teachers. These data 
are shown in Table XVIII. к 

Continuing the study of the same hypothe sis 
proposed for the analysis of the frequencies of 
disagreements, for specific procedure number 
five, A teacher items were defined as items on 
which the positive weighted score of the A teach- 
ers is 200 or more percent of the positive weight- 
ed score of the B or C teachers, whichever iS 
higher. No such item fits the definition. How- 
ever, one item, number 21, seems to reverse 
the hypothesis to a marked degree in that posi- 
tive weighted score for A teachers was 50 per- 
cent of the B teachers’ Score and 55 percent of 
the C teachers’ score, 

For specific procedure number six, A teach- 
er items were defined as items on which the neg“ 
ative weighted score of the A teachers is 55 or 
less percent of the negative weighted score of 
the B or C teachers, Whichever is less, Items 
numbered 9, 12, 24, and 25 fit the definition. 

Continuing the C teacher aspect of the hy- 


September, 1955) GROTKE 


TABLE XVII 


ITEM ANALYSIS OF TEACHER FACTORS: FREQUENCIES OF ASSIGNED 
POSITIVE AND NEGATIVE WEIGHTS 


Item A Teachers B Teachers C Teachers 
Number Pos. Neg. Pos. Neg. Pos. Neg. 

1. 5 4 7 2 6 6 
2 18 5 20 3 16 6 
3. 14 Т 13 8 14 7 
4. 13 6 12 9 12 9 
5 9 8 9 8 7 12 
6 12 5 13 7 9 7 
7 13 5 14 8 14 4 
8. 11 7 10 14 10 8 
9. 7 2 5 11 5 7 
10. 14 ο 6 18 9 13 8 
11. 13 7 14 11 10 3 
12. 17 1 22 4 17 3 
13. 10 9 12 Y 13 9 
14. 8 8 12 10 10 10 
15. 5 q 4 5 2 7 
16. 11 2 12 5 12 4 
17. 12 1 11 5 11 1 
18. 14 6 8 10 11 7j 
19 11 10 11 8 10 9 
4 13 8 11 6 

o" ы 9 8 12 8 12 
22. 7 10 15 5 Š 10 
23. 14 4 14 8 15 5 
24. 16 5 14 1 е : 
25 10 6 13 8 11 11 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE XVIII 
TEACHER FACTORS: ITEM ANALYSIS OF WEIGHTED SCORES 


A Teachers B Teachers C Teachers 
Item Pos. Neg. Pos. Neg. Pos. Neg. 
Number Score Score Score Score Score Score 
ДЫ 5 6 8 3 7 7 
2: 25 6 27 4 23 7 
3 17 1 17 9 18 9 
4 18 8 17 11 20 11 
5 13 8 11 9 9 15 
6 14 7 17 5 12 10 
7 18 т. 17 12 19 4 
8 15 8 11 17 за 9 
9 8 2 6 11 6 8 
10. 14 6 19 10 14 10 
ΤΙ, 14 9 18 14 15 5 
12. 26 1 26 6 25 3 
13 13 9 18 8 17 10 
14 13 11 18 16 15 11 
15. 6 7 5 8 3 7 
16. 13 3 13 7 15 4 
17. 13 1 12 5 11 1 
18. 17 6 15 11 15 10 
19. 14 12 15 11 13 13 
20. 16 6 17 12 16 8 
21. И 15 14 18 13 15 
22. 9 11 20 6 11 12 
8 қаш DESC 0004 
3 19 
25. 14 6 10 17 10 


20 1i 16 16 


September, 1955) 


pothesis stated above, for specific procedure 
number seven, C teacher items were definedas 
items on which the positive weighted score of 
the ο teachers was 50 or more percent of the 
Positive weighted score of the A or B teachers, 
whichever is lower. There are no such items. 
t For specific procedure number eight, C 
eacher items were defined as items on which 
ш negative weighted score of the C teachers is 
00 or more percent of the negative weighted 
Score of the A or B teachers, whichever is high- 
er. There are no such items. 
pud dBmary of the Eight Specific Procedures.— 
s ος between the points of view of the 
tive ers and their raters were assigned pos i- 
tos weights when the teacher classified the fac- 
as more important than did her rater, and 
negative weights when the teacher classified it 
as less important than did her rater. For each 
item of the Teacher Factor instrument frequen- 
Cies of positive and negative weights were ob- 
tained for the A, B, and C teachers. Secondly, 
for each item positive weighted scores and neg- 
ative weighted scores were obtained for the A, 
B, and C teachers. In greatly abbreviated form 
the eight specific procedures may be summar- 
ized with percentages and type of data indicated: 


l. A is 200% of B or C using plus frequen- 
cies, 

2. A is 50% of 3 or C using minus frequen- z 
cies. 

3. C is 50% of A or B using plus frequencies. 

4. C is 200% of A or B using minus frequen- 
cies, 

5. A is 200% of B or C using positive weight- 
ed scores, 

6. A is 55% of B or C using negative weight- 
ed scores. 

7. C is 60% of A or B using positive weight- 
ей scores, 

8. C is 200% of A or B using negative weight- 
ей scores. 


Conclusions to the Item Analysis of Teacher 


Factors, — The data seem to support some as- 
to Cts of the hypothesis, stated earlier, and not 
tain рогі other aspects of it. There are сег” 
SOR items on which the number of A teachers, 
is classify teacher factors as less important 
nu n did their raters, is decidedly less than the 
Ru of B or C teachers, who do so. Three 
h items were found. They are: 


ist Provides for individual differences. 

- Has mastery of subject matter. | 

16. Obviously fair with pupils of all minor- 
ity groups. 


, There are four items on which the degree of 


lvergencies of the responses of the A teachers, 


GROTKE 35 


who classified the Teacher Factors as less im- 
portant than did their raters, is decidedly less 
than the degree of divergencies of the responses 
of the B and C teachers, who did so. The four 
items are 9 and 12, stated above, and: 


24. Skillful in teacher-parent relationships. 
25. Assists in care and improvement of the 
School equipment, buildings, andgrounds. 


There was one item on which the hypothesis 
seemed to be reversed. The degree of diver- 
gency of the A teachers, who classified the fac- 
tor as more important than did their raters, was 
decidedly less than the degree of divergencies of 
the B or C teachers, who did so. This item is: 


21. Offers thoughtful comments and criti- 
cisms for improvement of the school. 


The data do not support the hypothesis in re- 
gard to C teacher items. On no items of the 
Teacher Factor instrument is the variation be- 
tween the C teachers and their raters decidedly 
different from the variations between the A 
teachers and B teachers and their raters. 

Individual items and their implications are 
discussed in more detail in the summary and 
conclusions to Part Three. 


Summary and Conclusions to Part Three: It- 
em Analysis. —Item analyses of all three meas- 
ures used to study professional distances were 
made. It was thought that if on certain items 
the number of teachers rated good who disagreed 
with their raters was very low as compared with 
the number of teachers rated average or poor 
who disagreed with their raters, suchitems may 
be considered critical in that agreementonthese 
items with one's rater may be associated witha 
teacher being considered good. Suchitems were 
called A teacher items. Similarly, itwas thought 
that if on certain items the number of teachers 
rated poor who disagreed with their rater was 
very high as compared with the number of teach- 
ers rated good or average who disagreed with 
their raters, then those items may be consider- 
ed critical in that disagreement on these items 
with one's rater may be associated witha teach- 
er being considered poor. Such items were 
called C teacher items. 

The disagreements between rater's and 
teachers’ responses to each item were analyzed 
using several different approaches. For each 
approach definitions of A teacherandC teach- 
er items were established and applied to Ше da- 
ta. Of the 53 items on the Teacher Practices 
instrument, six were A teacher items, andsix 
were C teacher items. Of the 120 items onthe 
Inventory of Beliefs, twelve were A teacher 
items, and six were C teacher items. Of the 


36 JOURNAL OF EXPERIMENTAL EDUCATION 


25 items on the Teacher Factor instrumen t 
five were A teacher items, and none was aC 
teacher item. The number of critical items on 
the three instruments would suggest that of the 
198 items studied, 23 are items onwhichteach- 
ers rated good agree with their raters to a de- 
cided degree more frequently than do teachers 
rated average or poor, and 12 are items on 
which teachers rated poor disagree with their 


raters to a decided degree more frequently than - 


do teachers rated good or average. A total of 
35 items were found to be critical, 

The A teacher items, numbers 2 and 3 of the 
Teacher Practices instrument, suggest that 
teachers considered good agree with their rat- 
ers on where or where not to stand while teach- 
ing. In contrast, four of the C teacher items 
found on the Teacher Practices instrument sug- 
gests that teachers rated poor disagree with 
their raters on where or where not to s tand 
(item 2) and where or where not to sit (items 4, 
5, and 6d). 

The A teacher item, number 23 on the same 
instrument, suggests that teachers rated good 
tend to agree with their raters on the evaluation 
of the psychological arrangement of subject 
matter, while the C teacher item, number 22, 
Suggests that teachers rated poor tend to dis- 
agree on the evaluation of organizing subject 
matter in problem-project form. Rela ted to 
these practices seems to be the A teacher item, 
number 9 of the Teacher Factors instrument. 
Apparently, teachers rated good place more im- 
portance on the mastery of subject matter than 
do B or C teachers, Mastery of subject matter 
Would be necessary for rearrangement of in in- 
to psychological unit form. 

Agreement with one's rater on the role of the 


teacher Concerning assignments seems to bea 
Critical area, 


y differentiating assignments. TheA 


do B or C teachers. 


The last A teacher item of the Teacher Prac- 
tices was measuring the results of learning by 
changed pupil attitudes and behaviors, Appar- 
ently, teachers rated good agree with their 
raters on the evaluation of this practice more 
frequently than do B or C teachers. 

Agreement with one's rater on attitud es 
toward minority groups was found to be a crit- 


(Vol. 24 


icalarea. Items 16 of Teacher Factors and 68 
of the Inventory of Beliefs are in this category. 
Apparently, A teachers agree with their raters 
on the importance of being fair with pupils of all 
minority groups and believing or disbelieving 
that it is advisable to include race, color, and 
religion in hiring applicants for a position. The 
responses of C teachers did not differentiate 
themselves from B teachers in this area. 

The topic of a teacher as a member of a pro- 
fessional staff offered three critical items. As- 
suming that differences among the variation be- 
tween teachers’ and their raters’ classification 
of Teacher Factors indicate differences among 
А, B, and C teachers in their stress on their 
behaviors, it would seem that A teachers stress 
being skillful in teacher-parent relations hips 
and assisting in the care and improvement of 
school equipment, building, and grounds more 
than do B or C teachers, and they stress offer- 
ing thoughtful comments and criticisms for im- 
provement of the school less than do Bor C 
teachers. 

Agreements and disagreements on religion 
was shown to be critical. In this area, three 
items, numbers 103, 105, and 108 of the Inven- 
tory of Beliefs, were A teacher items, and one 
item, number 102 of the instrument, was aC 
teacher item. Apparently, teachers rated good 
tend to agree with their raters more often and 
to greater extent on believing or disbelieving 
that churches cause needless Strife by over επι” 
phasizing differences among gro ups, that 
churches should take greater care of their par- 
ishioners rather than foreign missions, and that 
Schools should release school time for religious 
instruction. Teachers rated poor tend to disa- 
gree more frequently on believing or disbeliev- 
ing that people who claim to be religious are 
more tolerant than people who do not claim to be 
religious, 

A teachers tend to agree with their raters on 
items eulogizing the profession. Items number 
18 and 118 of beliefs were found to be A teacher 
items. Apparently, teachers rated good agree 
with their raters on believing or disbelieving 
that teaching offers a wide variety of interesting 
experiences and that working with people is bet- 
ter than Working with things. On theother hand, 
С teachers tend to disagree with their raters 
on beliefs suggesting less commendable atti- 
the profession, Items 19 and 
€ found as C teacher items. 
ers rated poor disagree with 
б lieving or disbelieving that in 
teaching, promotions are based on who you 
know rather than on What you know and that 
most people will take advantage of you. 

Several critical items Were found in an area 
that may be named Americanism and econom- 
ics. On two items, numbers 75 and 81 оп be- 


September, 1955) 


liefs, teachers rated good were foundto more 
frequently agree with their raters on believing 
= disbelieving that it is reasonable to fire a 
€acher who admits he is a Socialist and that 
private profits are essential to a successful 
economic system. Concerning labor, item 
number 40 was found as an A teacher item and 
ee 91 and 120 of beliefs were found as C 
poer items, Teachers rated good apparent- 
cer to join their raters in believing or dis- 
ae do that schools today make too many stu- 
tion S consider unskilled and semi-skilled posi- 
ο. as not good enough for them, while teach- 
a хааа оог tend to disagree with their rat- 
"d on believing or disbelieving that trade un- 
i ЩЕ have done more harm than good in our 
n ustrial progress and that a large a mount of 
a 1S à prerequisite to success. AnAteach- 
be item somewhat related to this area was num- 
e 44. Teachers rated good tend to agree with 
ог raters in believing or disbelieving that 
Sachers should teach students to side with the 
majority on controversial issues. 
of ер items dealt with the personal conduct 
ieee μα. The A teacher item, number 7 of 
to S, suggests that teachers rated good tend 
Прие with their raters in believing or disbe- 
буа that pupils should be permitted to call 
Chers by their given names. Item number 
а ee suggests that teachers rated poor 
CM disagree with their raters on believing 
us isbelieving that teachers should be free to 
е alcoholic beverages. 
Mrd attempt has been made in this study to 
poor € the teacher behaviors that are good or 
to det It was the intention of the item analysis 
agre €vmine on what items A teachers tend to 
ο е with their rater more frequently than do 
tend hs teachers, and on what items C teachers 
y λα disagree with their raters more frequent- 
er it n do A or В teachers. SuchAandC teach- 
not aed have been found and reviewed. It is 
re ben that the 35 items presented here 
quite Only critical items for all raters. It iS 
Critic kely that certain raters have many more 
till 8] items while others have many less. 
5 ore may have 35 different ones. The 
TOm itical items reviewed here were derived 
a А кы rather arbitrarily selected ieme m. 
means arbitrarily, though logically, selecte 
It would seem that, in the evaluating of teach- 
x Such minor aspects as where a teacher 
of thy” Sits may tend to attract the attention _ 
ting © rater, especially if the teacher issit i 
even don he thinks she should be standing. How 
та? Е does not seem reasonable to believe” 
tion ae orming contrary to the rater 5 Επ 
rate his issue would be sufficient reason 
a teacher as poor. 
ther items such as organization and mas- 


GROTKE 


37 


tery of subject matter, attention to individual 
differences, fairness to minority groups, and 
being a member of a professional group seem 
to be more significant items. Performing up to 
or surpassing the rater’s expectations on such 
issues is undoubtedly sufficient reason to be con- 
sidered good, while performing contrary to or 
short of expectations on these issues seems suf- 
ficient reason to be considered poor. If this is 
so, it is probably significant that teachers rated 
poor disagree with their raters more frequently 
than do A or B teachers on the evaluation of 
these teacher practices and on the importance 
of these teacher factors. It may be that pre- 
service and in-service education of teachers 
should place more emphasis on attitude forma- 
tion toward these factors and practices along 
with knowledge and skill concerning them. That 
these factors are important has been suggested 
by other studies. That lack of performance in 
these areas are considered cause for failure 
and dismissal has been shown by Buellesfield 
(9), Madsen (21), and Nemec (22). 

On the other hand, disagreements on relig- 
ious items, eulogizing attitudes toward the pro- 
fession, Americanism and economics, and per- 
sonal conduct of teachers tend to remind one that 
performing the role of teacher is much more 
than performing instructional duties. It is prob- 
able that teacher evaluations are based, in part, 
on how closely one's behavior approximates the 
ions of one's rater on certain critical 
That proper behavior in these areasare 
has been suggested by the studies of 
Edminston (12), Buellesfield (9), and Madsen 
(21). Itis likely that pre-service educationand 
orientational activities for beginning teachers 
may be modified to stress the broader aspects 
of the role of teacher. Further, it is possible 
that more compatible intra-staff relationships 
may be attained by selection and placement of 
personnel based on like beliefs onthose issues 
found critical with the rater. 

Lastly, it would seem that frequency and di- 
vergency of disagreements on certain issues 
between the rater and ratee are a factor 
in the teacher's evaluation. Therefore, to ob- 
tain a more accurate rating from the profession- 
al distance point of view, professionaldistances 
between rater and ratee Should be measuredand, 
if possible, kept constant, in the evaluation of 
teachers. 

Section Summary. — Using a modification of 
the causal-comparative research method, the 
three data gathering instruments were analyzed 
from three points of view. 

The first studied the hypothesis that profes- 
sional distance, 1.е., frequency and divergency 
of disagreements, increased as ratings de- 
creased from good to average and average to 
It was found that the data present here 


expectat 
areas. 
important 


poor. 


38 JOURNAL OF EXPERIMENTAL EDUCATION 


did not completely support such a hypothesis. 
Depending upon the specific method of analysis 
and the instrument, the number of schools in 
which A teachers are a shorter professional 
distance from their raters than are the Bor 
C teachers varied from 9 to 16 of the 30 schools 
studied. Likewise, the number of schools in 
which C teachers are further from their raters 
than are the A or B teachers varied from 8 to 
17. It was suggested that raters may accept 
alternate points of view on items known to be 
controversial without decreasing the teacher’s 
rating. Further, it was suggested thatthe man- 
ner of disagreeing may be an important factor 
along with the number of disagreements. Last- 
ly, it was suggested that some raters are more 
tolerant of conflicting points of view than others. 
The second approach studied the frequency 
of disagreements without regard to degree of 
divergencies. It was hypothesized that the 
teacher rated poor was the one who most fre- 
quently disagreed with the rater, while the 
teacher rated good most frequently agreed. The 
data seemed to indicate that frequency of disa- 
greement slightly increases as ratings increase 
from poor to average and average togood. How- 
ever, frequency of disagreement s lightly de- 
crease as ratings decrease from good directly 
to poor. Such conclusions do not entirely con- 
tradict one another. It is Suggested that teach- 
ers rated average are not necessarily between 


good and poor, a hypothesis Suggested by other 
research, 


e frequently and 
or C teachers, 
chers disagree 


th C teacher and rat- 
ibuted eq ually be- 
beliefs related to 


education. None was concerned with the im - 


portance of teacher factors, 
It was suggested (1) that pre-service and in- 
service education of teachers increase the em- 
phasis given to attitude formation toward Such 
critical issues as individual differences, or- 
ganization and mastery of Subject matter 
fairness to minority groups, and being a mem- 
ber of a professional Staff, (2) that pre-service 
education and orientation of beginning teachers 
into the profession be modified to increase the 


(vol. 24 


emphasis on the broader aspects of teaching as 
a way of life, to include such areasas the teach- 
er’s role on religious issues affecting the school, 
eulogizing attitudes toward the profession, Amer- 
icanism and economics, and personal conduct of 
teachers, (3) that, perhaps, more congenial in- 
tra-staff relationships may be attained by place- 
ment of personnel based on their points of view 
on critical issues, and (4) that professional dis- 
tance on critical items must be considered in 
the evaluation of teachers to get more accurate 


ratings from the professional distance point of 
view. 


SECTION IV 


Summary and Conclusions 


THIS STUDY attempts to show the rela- 
tionship of professional distance to teacher rat- 
ings. Professional distance, adapted from the 
sociological term, social distance, is defined 
as the frequency and divergency between points 
of view held by professional workers on what 
Constitutes the professional role of the good 
teacher. The greater the divergence and the 
more frequent the disagreements, the longer Ше. 
professional distance; the lesser the divergence 
and fewer the disagreements, the shorter the 
professional distance. Professional role, adapt- 
ed from the sociological term, social role, is 
defined as the overt and covert beha viors τε” 
quired of a person in a specific professional Ρο” 
sition. One's concept of the professional 
role of the good teacher, or aspects of it, serves 
as one's criterion to evaluate teac hing. The 
evaluation of one's own teaching is an expres- 
Sion denoting the difference between one's con- 
cept of one's performance and one's concept of 
the professional role of the good teacher. The 
evaluation of another's teaching is an expression 
denoting the difference between one's conceptof 
another's teaching and one's concept of the рго- 
fessional role of good teaching. Ifa teacher, 
who is approximating her own concept of teach- 
ing, is evaluated by a rater whose concept of 
good teaching is quite different from her's, the 
rating is apt to be poor. If the same teacher i$ 
evaluated by a rater whose concept of good teach- 
ing is similar to that of the teacher's, then the 
rating is apt to be good. Therefore, itis hypoth" 
esized that lengths of professional distance in^ 
crease as ratings decrease from good to aver- 
age and average to poor. 

In reviewing researches in both education 
and sociology, the terms, professional dis~ 
tance and professional role of the good teach" 
er, were not found. The concepts of social dis~- 
tance in place of ‘favorable’? and ‘‘unfavorable” 
attitudes toward minority groups was reported 
by Bogardus in 1928. Assigning weights to de- 


September, 1955) 


grees of social distance was reported by Dodds 
in 1935. In perhaps all educational research 
the presence of professional distance may be 
Seen. The terms, traits, ratings, pupil change, 
test scores, and college grades, all imply spe- 
Cific behaviors. When these are used as criter- 
ia for good teaching, the behaviors required to 
attain desirable scores, ratings, pupilchanges, 
ейс., are the behaviors required of persons who 
would perform the professional role of the rat- 
€r's concept of good teaching. Selected studies 
of the normative survey and correlational types 
Were reviewed, and evidences of profes sional 
distances were pointed out. It was further sug- 
gested that correlations between sets of teach- 
er ratings increase as the likelihood for profes- 
Sional distances decreases. 
st та methodology of research used in this 
ti udy is a modification of the causal-compara- 
ds technique. The presence of the first phen- 
депо under investigation was а teacher being 
woe a good teacher, and the absence of 
"ie phenomenon was a teacher being consider- 
W average or poor. The second phenomenon 
zem a teacher being considered a poor teacher, 
nd its absence was a teacher being considered 
соч or average, These definitions placed the 
tea, Tage teacher in between the good and poor 
a Chers. Frequencies and divergencies of dis- 
yea with one’s rater on the overt andco- 
for behaviors required of a person who рег” 
as d the role of the good teacher were studied 
of b е circumstances attendant to the presence 
oth phenomenon. 
"ss the application of the research technique, 
соп munities were studied. From the first 
το munity 17 school faculties were selected. 
ed m the second 13 school faculties were select- 
pal тош each of the 30 faculties, the princi" 
urn erved as the rater of his teachers. He, in 
teach Selected а teacher he considered good, а 
ec er he considered average, anda teacher 
жете еге poor. The groups of teachers 
ers f named A teacher, B teachers, and C teach- 
Em Convenience, All subjects were drawn 
he ementary school levels. ) 
Sought measuring instruments for this study 
Overt the points of view of the subjects onwhat 
Berean cs covert behaviors each required of the 
teacher playing the professional role ofhis good 
The fe Three instruments were des igned. 
Which tre consisted of 53 teacher practices, 
or 1 the subjects classified as “good”, ‘poor’, 
Or makes no difference", і.е., neither good 
enbor. The second instrument was 120state 
Select, of beliefs related to education. Subjects 
ed one of the following as their reponse: 
inclin. definitely believe this statement; lam. 
am led to believe this statement; I cannot say; 
inclined not to believe this statement; and 
B definitely do not believe this statement. 


GROTKE Т 


The third instrument consisted of 25 teacher 
factors which subjects evaluated on a five-point 
scale from ‘‘of utmost importance” to ‘‘insignif- 
icant”. No claim was made that only these 198 
items make up one’s total concept of the profes- 
sional role of the good teacher. 

All raters and teachers responded to the in- 
struments. The responses of the teachers were 
compared with those of their rater. Differences 
between the responses were quantified. Those 
differences suggesting little divergence were 
weighted a small value. Those sugges ting a 
greater divergence, or opposition, were as- 
signed a larger value. 

Weights assigned to the differences were an- 
anlyzed from three points of view. The first 
considered professional distance, frequency and 
divergency of disagreements; the second consid- 
ered frequency without regard to divergencies; 
the third was an item analysis. Professional 
distance scores were computed for the disagree- 
ments between each rater and each teacher he 
rated by summing the assigned weights. For 
convenience, the professional distance scores 
were associated with the teacher ratings. Itwas 
found that the data did not completely sup port 
the hypothesis, stated above. Depending upon 
the specific method of analysis and the instru- 
ment, the number of schools in which A teach- 
ers are a shorter professional distance from 
their raters thanare the B or C teachers 
varies from 9 to 16 of the 30 schools studied. 
Likewise, the number of schools in which C 
teachers are a longer professional distance from 
their raters than are the A and B teachers var- 
ies from 8 to 17. It was suggested that raters 
may accept alternate points of view on items 
known to be controversial without decrea s ing 
the teacher's rating. Secondly, itwas suggested 
that the manner of disagreeing may bean import- 
ant factor along with the number of disagree- 
ments. Lastly, it was suggested that some 
raters of teachers are more tolerant of conflict- 
ing points of view than others. ) 

The second analysis studied the frequencies 
of disagreements without regard to degree of di- 
vergence. It was hypothesized that A teachers 
disagree less frequently than do either B or C 
teachers, andC teachers disagree more fre- 
quently than do either A or B teachers. The da- 
ta here presented seemed to indicate that fre- 
quency of disagreement slightly increases as rat- 
ings increased from poor to average and aver- 
age to good. However, frequency of disagree- 
ment slightly decreases as ratings decreased 
from good directly to poor. Such conclusions 
do not necessarily contradict one another. It 
is suggested that teachers rated average are 
not necessarily between good and poor teachers. 

The third analysis studied the 198 items on 
the three instruments. It was hypothesized that 


40 JOURNAL OF EXPERIMENTAL EDUCATION 


on certain items A teachers agree with their 
raters more frequently and with less divergence 
than do either B or C teachers, and that C teach- 
ers disagree with their raters more frequently 
and with greater divergence than do either A or 
B teachers. Twenty-three items were foundto 
Support the A teacher aspects of the hypothesis 
and twelve items supported the C teacher as- 
pect. It was suggested that (1) pre-serviceand 
in-service education of teachers emphasize at- 
titude formation toward such critical items as 
individual differences, organization and mas- 
tery of subject matter, fairness toward minor- 
ity groups, and being a member of a profession- 
al staff, (2) that pre-service and orienta tion 
activities for beginning teachers into the pro- 
fession increase the emphasis on the broader 
aspects of teaching, (3) that, perhaps, more 
congenial intra-staff relationships may be at- 
tained by placing personnel on the basis of their 
points of view on critical items, and (4) that 
professional distance on critical items must be 
considered in the evaluation of teachers in or- 
der to get more accurate ratings from the pro- 
fessional distance point of view. 


BIBLIOGRAPHY 


1. Almy, H. C. and Sorenson, Н. ‘A Teach- 
er-Rating Scale of Determined Reliability 
and Validity, " Educational Administra- 
tion and Supervision, XVI (March 1930), 
pp. 179-86. 

2. Barr, А, 8. Characteristic Differences in 
the Teachin, Performances of Good and 

Poor Teachers of the Social Studies 

(Bloomington, Ill.: Public School Publish- 

ing Co., 1929), 

е “Тһе Measurement and Predic- 
tion of Teaching Efficiency: A Summary 
of ы Чолга, " Journal of Exper L= 
meni ucation 
τη » XVI (June 1948) pp. 
= » and Emans, L. M. “What Qual- 
ities are Prerequisites to Success in 
Teaching?” Nation's Schools, VI (Sep- 
tember 1930), pp. 60-4. 


5. “~~~, and others. Supervision: Demo- 
cratic Leadership in Improvement of 
Learning, Second editio: 


n (New York: D. 
Appleton-Century Co. , 1947). 


. Bogardus, E. 5, 


Immigration and Rac - 
titudes (Boston: = 


D. C. Heath and 
1928). and Co., 


n, Bogardus, E. S. “Тһе Measurement of 
Social Distance, ” in Readings in Social 


Psychology. T. M. Newcomb and E. L. 


Hartley, editors (New York: Henry Holt 


о 


апа Со. , 1947), 
8. Bousfield, W. А, “Students’ Ratings of 


(Vol. 24 


Qualities Considered Desirable in College 
Professors, ’’ School and Society, LI (Feb- 
ruary 1940), pp. 253-56. 

9. Buellesfield, Н. «Causes of Failure Among 
Teachers, ’’ Educational Administration 


and Supervision, I (September 1915), pp. 
439-45. 


10. Cuber, J. F. Sociology: A Synopsis of 
Principles (New York: D. Appleton-Cen- 
tury Co., 1947). 

11. Dodd, S. C. “А Social Distance Testinthe 
Near East, " American Journal of Sociol- 
ОБУ, XLI (September 1935), pp. 194-204. 

12. Edmiston, R. W. and Cahill, C. M. “What 
Does the Rural Community Expect of its 
Teachers?” Educational Administration 
and Supervision, XXVI (February 1940), 
pp. 98-102. 

13. Encyclopedia of Educational Research, W. 
8. Monroe, Editor (New York: Macmillan 
Co., 1950). Revised Edition. 

14. Encyclopedia of Social Sciences, E. R. Se- 
ligman, Editor (New York: Macmillan Со., 
1934), 


15. Good, С. v, ; and others, The Methodology 
of Educational Research (New York: D. 
Appleton-Century Co., 1941), 

16. Greenwood, E. Experimental Sociology 
(New York: King’s Crown Press, 1945). 

l7. Haggard, W. w. “Some Freshmen De- 
Scribe the Desirable College Teacher, ”’ 
School and Society, LVII (September 19- 
43), pp. 238-40, 

18. Harris, C. W. “The Appraisal of a School 
— Problems for Study, ” Journal of Edu- 
cational Research, XLI (November 1947), 
pp. 172-82. 

19. Lamke, Т. A, “Personality and Teaching 
Success, Journal of Experimental Edu- 
cation, XX (December 1951), pp. 217-57. 

20. Lamson, E. F. «Some College Students 
Describe the Desirable College 

Teacher, ’’ School and Societ , LVI 
(December 1942), pp. 6-15. 

- Madsen, I. N. “Тһе Predictionof Teaching 
Success," Educational Administration and 
Supervision, XIII (January 1927), pp. 39-47. 

22. Nemec, L. G. «Teacher Certification, ” 
Journal of Experimental Education, XV 
(September 1946), pp. 101-32. 

23. Newcomb, Т. M. Social Psychology (New 
York: Dryden Press, 1950). 

24. Witty, P. A. «Evaluation of Studies of the 
Effective Teacher, ” inImproving Educa- 
tional Research, official report of the 
American Educational Research Associ- 
ation (Washington, D.C., : American Ed- 
ucational Research Association, 1948). 

25. Wilson, L. and Kolb, W. 


n Sociological An- 
alysis (New York: Harcourt, Brace and 
Co., 1949). 


September, 1955) GROTKE 41 
26. Woodworth, R. S. and Marquis, D. G. Psy- 21. Young, P. V. Scientific Social Surveys and 
chology (New York: Henry Holt and Co., Research, second edition (New York: 


1947). Prentice-Hall, 1949). 


e 


AN INVESTIGATION OF THE NEW YORK 
STATE REGENTS EXAMINATIONS 
IN SCIENCE 


GEORGE GREISEN MALLINSON 
Western Michigan College of Education 
Kalamazoo, Michigan 
JACQUELINE V. BUCK 
Grosse Pointe Public Schools 
Grosse Pointe, Michigan 


Foreword 


за a) ως as extensive as the one re- 
inm Tt erein obviously is not the work of one 
numb represents the coordinated efforts of a 
Nur er of science educators who have devoted 
plishe of workand personalfinances to accom - 
Würd a job thathas long needed doing. Their re- 
chieti forall practical purposes are intangibles, 
Eie уне satisfactions from jobs well done. To 
ble - equate credit to these workers is impossi- 
Hine uffice to say the list that follows contains the 
като Е of those to whom no adequate credit can 
forts expressed verbally. Without their ef- 
e са iscussions about Ше New York State 
Duran S Examinations in Science w o uld fall 
egg into the realm of speculation and conjec- 
е. The list follows: 


Leo Alberti 
Jacqueline V. Buck 
ney V, DeBoer 

rale A. Fuelling 
Dag М. Mallinson Harold E. Sturm 
id J. Miller Kenneth E. Summerer 


Richard G. Telfer 


James L. Pellowe 
John J. Schmitt 
Fred J. Service 
Wayne A. Stafford 


ең апу other persons contributed time and 
gestio in providing advice, criticisms and sug- 
them 45 18 Various phases of the study. Among 
Com are Mr. Wilton E. Baty, Chairman of the 
сав on Regents Examinations, New York 
‘empl сіепсе Teachers Association; Mr. Hugh 
the Ὁ eton, Supervisor of Science Education of 
безі ο y of the State of New York; Мг. 
ew Yo E. Van Hooft, formerly President of the 
Dr ork State Science Teachers Association; 
of Res Cayce Morrison, formerly Coordinator 
State patch and Special Studies of the New York 
Findley "artment of Education; Dr. Warren K. 
Vice Y, Director, Evaluation and 
. 2 Educational Testing Service; Dr. Kenneth 
Univer derson, Dean, School of Education of the 
това >. of Kansas; Dr. Francis D. Curtis, 
Ssor-Emeritus of Education and of the 


each; we : 
aching of Science, University of Michigan; 


Advisory Ser- 


and Miss Agnes Hodahl, formerly New York 
State Representative of the National Science 
Teachers Association, Albany, New York. 


SECTION I 


THE EXPERIMENTAL DESIGN 


The Problem 


THE PROBLEM of this investigation is 
two-fold: (1) to investigate the attitudes of cer- 
tain science teachers from the State of New 
York toward the New York State Regents Exam- 
inations in Science, and (2) to analyze and eval- 
vate certain characteristics of the Regents Ex- 
aminations for Biology, Chemistry, Earth Sci- 
ence, and Physics prepared for the examination 
periods of January 25, 1949; June 21, 1949; Jan- 
uary 24, 1950; and June 20, 1950. 


Background of the Study 


1949, the director of this 
study met with Mr. Hugh Templeton, Supervisor 
of Science Education, of the University of the 
State of New York to discuss certain aspects of 
science teaching. During the course of the con- 
versation the work of a committee of the New 
York State Science Teachers Association con- 
cerning the attitudes of science teachers of New 
oward the Regents Examinations was 


York State і 
discussed. The committee, under the chairman- 
ship of Mr. Wilton E. Baty, Huntington High 


School, Huntington, New York, had planned to 
polla number of science teachers of New York 
State to obtain an objective analysis, for thefirst 
time, of their opinions of the Regents Examina- 
tions in Science. 

Many factors, among them time, personnel, 
and finances, made it impossible for the com- 
mittee to carry out its task. Hence, through 
the good offices of Mr. Templeton, thesurvey 
of opinions was delegated to the director of this 
study whose activities were still subject to the 


approval of Mr. Baty’s committee and Mr. 


On December 23, 


44 JOURNAL OF EXPERIMENTAL EDUCATION 


Templeton. 

Suffice to say the survey was duly carried 
out by Mr. David J. Miller of Lakeview Junior 
High School, Battle Creek, Michigan, and was 
prepared as a report for his master’s thesis at 
the University of Michigan. With the approval 
of the University of the State of New York the 
report was subsequently published in an issue 
of Science Education, 1* 

During the months that followed the initial 
Stages of Miller’s investigation, the director 
and Mr. Templeton met on a number of occa- 
Sions to discuss the progress of the Study. At 
one of these conferences, it was indicated that 
an investigation of the attitudes of the teachers 
was most desirable, but that it was unlikely that 
Such an investigation would reveal the objective 
Characteristics of the examinations. It was de- 
cided, therefore, that the director should pre- 
pare a research design that would provide a 
means for evaluating the objective characteris- 
tics usually evaluated in an examination as well 
as certain characteristics unique to the Regents 
Examinations in Science. 

After several weeks the director submitted 
a design to Мг. Templeton. The design was 
studied by Mr. Templeton and other members 
of the State Department of Education who were 
likely to be concerned. After modifications 
were made in light of criticisms and Suggestions, 
it was decided that a sampling of the Regents 
Examinations for Biology, Chemistry, Earth 
Science and Physics should be item-a na lyzed 
and the following factors considered: 


1. A determination of the reliability, consis- 
tency and validity of the examinations. 

2. A comparison between the scores obtained 
by students from small high schools and 
from large high schools. 

З.А comparison between the Scores obtained 
by girls and boys, 

4. А determination of the levels of reading 
difficulty and vocabulary load of the var- 
10US examinations. 

5. Ап analysis of the types and frequencies 
of Scoring errors made by the teachers 
who scored the examinations. 

6. An analysis of the various test items on 
the examinations in order to determine 


their popularity, difficulty and discrimin- 
atory power, 


The report that follows in Section II dea ls 
with the factors just mentioned, 


Footnotes will be found at the end of the article, 


SECTION П 


SAMPLING THE REGENTS EXAMINATIONS IN 
SCIENCE AND TALLYING THE SCORES ON 
THE ITEMS 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is (1) to describe the procedure used 
in obtaining a representative sampling of Re- 
gentExaminations in Science for analysis, and 
(2) to describe the manner in which the scores 


on the examination items were talled and sum- 
marized. 


Obtaining a Representative Sampling of the Re- 
gents Examinations in Science Е 


In order to carry out the study it was neces- 
sary to obtain copies of the examination papers 
after they had been written by students in the 
State of New York. Copies of these papers were 
available since all the passing papers are for- 
warded to the State Department of Educationand 
are stored for one year pending a possible re- 
view of the scores. Through the cooperation of 
Mr. Peter Muirhead, of the University of the 
State of New York, permission was received to 
Obtain the needed examination papers. It was 
agreed that the number and type of papers need- 
ed would be sent to the investigators provided 
that students' names and locations were kept con~ 
fidential, and that, at the completion of the study; 
all papers would be destroyed. (Suffice to say 
the agreement was Scrupulously followed and 
all papers were Subsequently destroyed.) One 
weakness in this phase of the study is obvious. 
Only passing papers were available and hence 
the study does not deal with any analysis of рар- 
ers scored as failures. 

The problem of Sampling the vast number of 
papers was a difficult one since an accurate an^ 
alysis of the parameter of the student population 
Was impossible. A number of conferences were 
held with statisticians of the State Department 
of Education, University of Michigan and West- 
ern Michigan College of Education. Four as- 


Sumptions were deemed defensible as a basis for 
making the Sampling: 


1. Complete bundles of papers turned in by 
Schools should be selected instead of sampling 
individual papers. Thus considerations of cross- 
Section of student population, and socio-econ- 


(Vol. 24 


September, 1955) 


a would be met beyond reasonable 
"E size of the bundles sent into theState 
да well uenced by the size and type of school, 
take th as geographical location. In order to 
ρα ese factors into account in the sampling 
à Ида the distribution of sizes of bundles 
dist ‘ted for analysis should be the same as the 
SL. Tibution of sizes of those turned in to the 
te. 
Mor Ше satisfaction of Ше first two assump- 
petis 5. depend on the fact that at least 1500 
eQ b or any one examination should be select- 
has Fandom selection of the proper sized 
4 E rom the total group in storage. 

T ad n order to assure that the analysis would 
from кечеме for a field of science, papers 
anal our consecutive examinations should be 

yzed. 
In the final study these considerations were 
wae ks the following modifications: 
tions It was decided to analyze the examina- 
а Ше areas of Biology, Chemistry, Earth 
21, 194 and Physics for January 25, 1949; June 

t Will January 24, 1950; and June 20, 1950. 
tions w е noted that four consecutive examina- 
pared i not used. The examinations pre- 
Ons: Or August were disregarded for two reas- 


а) The persons taking Regents Examinations 
in Science in August aré atypicalof those 
taking them in January and June. Inor- 
dinate proportions consist of (1) poor stu- 
dents who failed the examination at one 
of the prior periods and are repeating the 
examination after further study insummer 
Session, and (2) good students who are at- 
tempting to accelerate their high-school 

b) oe via summer study. 

Tequently, 1500 examinations in eachof 
he various areas of science are not for- 

Warded to the State in August, and hence 
2 ΤΕ Sumption 3 could not have been met. 

2000 г Was decided to analyze approximately 
ee than 1500 papers for each of the ex- 

further ЫЗ and periods in question in order to 

Possipj, insure a proper sampling. It was not 

cience. to obtain 2000 examinations for Earth 
lon si Р any of the periods under considera- 

Pers on in each case the total numbers of ра” 

fürneq ere less than 2000. However, all those 

t le aoe analyzed А 

е Ocess of sampling the papers W£ 

Exam dertaken through the joint efforts of the 

Peryiggn 1018 Bureau and the office of the Su- 

Were 5 T of Science Education. The papers 

Cage Phipped at four separate periods, in eac 

the g One year after they had been forwarded to 

lists te Department of Education. Table Т 

lyzed, the numbers of papers that were anà 


MALLINSON - BUCK 


Tallying the Scores on the Regents Examination 
in Science 


Any analysis of the examinations demanded, 
of course, a means for tabulating the points of 
credit obtained by the various students on the 
various items of the examinations. It was de- 
cided therefore to prepare a tally sheet that 
would be suitable for tallying the different ex- 
aminations. 

The Regents Examinations in Science are di- 
vided into two parts. Part I of all these exam- 
inations consists of fifty items of the modified 
true-false, completion and multiple-choice 
types. The following are examples: 


A. Modified true-false type: 


“30. Light is transmitted through a vacuum. 
dO. уа XT 


(For each correct statement, the word true 
is written on the line following the item. If the 
statement is incorrect the term that must be sub- 
stituted for the italicized term to make the state- 


ment correct is written on the line following the 


item.) 
B. Completion type: 


«5. An object with an excess of electrons is 
charged Gigs то. ВИЕ. s Ἐν 


C. Multiple-Choice type: 


“414, The liquid which contracts when heated 
from 3° to 49 C. is (1) alcohol (2) Кег- 
osene (3) mercury (4) water. 

Азы s= var tos ГА M 


Part I give one point credit 


All the items on 
r wrong with no par- 


and are scored either right о 


tial credit. 
It was necessary therefore to develop a tally 


sheet on which the scores obtained by every stu- 
dent on every examination item could be tallied. 
Also later in the study it would be necessary to 
compute the coefficients of reliability of Part I 
of the examinations by means of the odds-evens 
technique. Hence it was decided to develop a 
tally sheet that would serve both these purposes. 
The section of the sheet for tallying Part Iwas 
divided so that the total number of points ob- 
tained on the even-numbered items could be 
computed separately from the total number ob- 
tained on the odd. Thus it was possible to ob- 
tain a score of twenty-five points on each of 
these halves. The numbers on the vertical or- 
dinate of the tally sheet designate the number of 
the item; those on the horizontal ordinate, des~ 
ignate the various students whose papers were 


45 


46 JOURNAL OF EXPERIMENTAL EDUCATION 


tallied. Each student retained the same nu- 
merical designation throughout the tally sheet. 

If an item on Part I were answered in error, 
a dot was placed in the proper square. When 
ail items were so tallied, the total numbers of 
errors for the odd items, and for the even it- 
ems, were totaled. From these totals were 
computed the total scores obtained: by the stu- 
dents on the odd items, even items andon both. 
It should be stated here that papers of all stu- 
dents receiving the same total score were tal- 
lied on the same sheet. Thus papers scored 

were tallied on one sheet, those scored 65 

another, those scored 66 on another, andso 
on. The scores obtained on the odd and the 
even items, together with the total score on 
Part I, were then entered in the appropriate 
spaces at the top of the tally sheet, A sample 
tally is shown on the next page. 

The tally reads as follows: 

For Part I, Student 1 gave incorrect re- 
Sponses to items 3, 21, and 37 for a Score of 
22 on the odd items. He gave incorrect re- 
Sponses to items 22 and 48 for a score of 23 
onthe even items. His total score for Part I 
is 45. 

Part П, however, offered differentproblems 
with respect to tallying. It consists of eight 
each worth ten points, 

y elect fiveforapos- 
The items on PartII, 
however, vary from one another with respect 
to the numbers of parts they contain and the 
points of credit assigned to the parts. 

In taking Part Π of the examination, the stu- 
dent completely rejects either three or four it- 


A second pa 
Oped to handle the variou: 


various parts of the items are entered in the 
appropriate squares. The first row across 
each block is labelled “Comp. Reject.” A 
check is placed in this row if a student reject- 
ed an entire item. The spaces marked “Res 
ject” are checked if a student rejected a sub- 
part of an item (if given a choice). In thecase 
of examinations having nine items, an addition- 
al block was affixed to the bottom of the sheet. 


The total scores obtained by the students on the 
items they elected on Part II were then entered 
in the appropriate spaces at the top of the tally 
sheet. (See sample tally for Part I for space 
allotted to total score on Part H.) 


РА A sample tally for Part II is shown on page 


The next task was to develop a means for 
Summarizing the scores thus tallied so that they 
could be analyzed Statistically. As a result 
three summary sheets were prepared. 

Summary Sheet Iwasa duplicate of the top sec- 
tion of the large tally sheet except for spaces on 
the lower right in which were computed the hor- 
izontal totals. It was from this sheet that the 
Scores were taken for computing coefficients of 
reliability for PartI, and for computing coeffic- 
lents of consistency between the scores obtained 
by the students on Part Іапа Part I. А ѕераг- 
ate sheet was used for Summarizing each tally 
Sheet. 

Summary Sheet Пав designed to summarize 
the total number of errors, the total number of 
Correct answers, and the average scores for 
each of the items on Part I, A separate sheet 
was used for all papers receiving the same tot- 
al scores. 

Summary Sheet ΠΠ was designed to summar- 
ize the items on Part ΤΙ. A separate sheet was 
used for all papers receiving the same total 
Score. The following is an explanation of the 
meaning of the symbols in the blocks, 


ΡΡ = total possible points. Computed 
by multiplying the value of the sub- 
part by the number of persons who 
elected the Sub-part. 


PE = total points earned. This was the 
total number of points received by 
the persons electing the sub-part. 


PM = total points missed. PP minus PE. 


PR = the number of persons who reject- 
ed certain sub-parts of an item if 
а choice was given. 


Av. Sc. = average score. This was obtained 
by dividing the total possible points 
(PP) by the number of persons who 
elected that Sub-part of the item. 


(Percentage Score: This was entered in the 
margin and was obtained by dividing the points 

*ned (PE) by the total possible points (pp)) 
TA = total number of 


Persons electing 
an entire item, 


TR = total number of Persons rejecting 


(Vol. 24 


September, 1955) MALLINSON - BUCK 


Тате = [e Го” elo 
[= = qe чүт 
Odds |22 l 
Evens |2 — 
ΒΝ 


| Part | | 4 


48 


4 
JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 2 


PART Il 


1 


T3 
к= elo! 


Comp. Reject 


B Reject 


Points 


AJReject 


Points 
| AgRe ject 


Points 


DReject 


Points 
EReject 


2 


Reject 


Points р 
F Reject 
Points 


1 


Points 


The tally for Part II reads 
as follows: Student 1 elected 
item 1 having sub-parts Αλ, Αρ, 

Bj, Во, and Вз. He received 1 
point on part А, and 2 points on 
each of the other parts. He 
decided to reject completely item 
2 having parts А, Ар, Bi, Bo, and 
Bg. He elected item 3 having parts 
A, B, C, D, E and F and chose to 


reject, as was his privilege, part 


C. He received one point of credit 


on Farts B and F, and 2 on each of 
Parts A, D and E for a total of 


8 points, 


— NN 
e .— ——sas 
a — 

— ÁÓ— 2 


September, 1955) 


an entire item. 


TA plus TR equals the total number of per- 
Sons whose papers were summarized on the 
Sheet. 


The nine blocks, of course, were designed 
for Summarizing separately the scores obtained 
оп the various sub-parts of the eight or nine 
items found on Part II of the examinations. In 
the left columns of the various blocks were en- 
tered the numbers and letters designating the 
various sub-parts of the items. 

The uses made of the computations found on 
the various Summary Sheets will be indicatedin 
later pages of this report. 

Sample Summary Sheets that are filled out 
are shown on the next three pages. (Note: A 
Check mark below a grade denotes an error in 
the scoring of a student's paper.) 


SECTION III 


THE RELIABILITY, CONSISTENCY AND VAL- 
IDITY OF THE REGENTS EXAMINAT ION 
IN SCIENCE 


The Problem 


τς PROBLEM of this phase of the in- 
los igation is (1) to describe the methods used 
d порица Ше reliability, consistency and 
alidity of the Regents Examinations inScience, 


(2) to report the results of these computa- 
ions, 


Methods Employed 


oon obvious that any measure of the reli- 

Eee y, Consistency and validity of the Regents 

is minations in Science would involve comput- 
8 coefficients of correlation. The device 
osen for use in this study was the Pearson r. 
€ Scatter diagram technique cited by Guil- 


іога2 
Hons Was employed for making the computa - 


“нү Will be described more fully later, а 
com, eee of classes were used, incertain 
are for grouping the data into inter~ 
lowered hus the estimates of correlation were 
fore to to some degree. It was decided there- 
d. Correct for errors in grouping, using 
Prepared by Peters and Van Voorhis. 3 In 


er 
™S of a formula the correction is as follows: 
r 
Te = 
97 Cx Cy 


in whi ; 
rela he Tc is the corrected coefficient of cor- 
П, r is the coefficient of correlation as 


MALLINSON - BUCK 


49 


computed from the coarsely grouped data, and 
Cx and Cy are the correction factors based on 
the number of class intervals in X and Y respec- 
tively. The use of this correction seems justi- 
fied since the assumptions underlying its use 
were met. 


Computing Coefficients of Reliability 


There are three common methods for com- 
puting ccefficients of reliability, namely, the 
split-half, alternate-form, and multiple-admin- 
istration. Since only one form of a Regents Ex- 
amination in Science is prepared, and that ad- 
ministered only once, the only method suitable 
for use in this study was the split-half. 

This method, however, could not be used in 
determining the reliability of an entire Regents 
Examination in Science. A casual survey of a 
sample examination indicates clearly that the 
split-half method is applicable to Part I only. 
Hence, it was decided to compute the reliability 
of Part I without regard for the scores on Part 
IL 

The scores from Summary Sheets I were 
then transferred to a scatter diagram. Those 
the students obtained for the odd items on Part 
I were tallied on the horizontal ordinate, those 
for the even, on the vertical. The tallies were 
made on the basis of two point intervals, nan:ely 
6-7, 8-9, and so on up through 25 points. The 
point score of 6 was set as the lower limit since 
it was highly improbable that a lower score on 
either the odd or even items on Part I would 
have appeared on a passing examination paper. 

The coefficients of reliability were then com- 
puted and adjusted with the Spearman-Brown 
formula, and corrections were made for coarse 
grouping of data. Table II lists the results. 


Computing Coefficients of Consistency 


Since it was not possible to use the scores on 
Part II for computing coefficients of reliability, 
it was decided to compute coefficients of correl- 
ation between the total scores the students ор- 
tained on Part II and those they obtainedon Part 
I, It was assumed that such computations (re- 
ferred to here as ‘‘coefficients of consistency’) 
might show the relationship between the abili- 
ties of students to answer correctly the ‘‘fac- 
tual’’ items on Part I and their abilities to ans- 
wer the ‘‘thought’’ items on Part II. 

The coefficients of consistency were comput- 
ed and corrected for coarse grouping of data. 
Table III lists the results, 


Conclusions 


Insofar as the techniques used in this phase 
of the investigation may be defensible, the fol- 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


50 


£0897 


Il ΠΕΠ A aE T eye ТР 1 U sh ОЯ 
mee ee ee ЗЕЛ 5... 
H = 


зтипош [oor] οὔ] 9ο] 1ο] ο ο] τε] τε το το οὐ οὐ ο] Za] οἱ ο τὶ εο cal το] ο] οὐ] ax] аа а ва τε 


A A 
^ n 
AL T Jefe] oof eleele] ao Гә: ТТ Генон 
ALLL] fetele rio s orori] cles feefee fos о ате ЭЗГЕ. 
TO TT oslo exer oe] eran Шы, 
{| | | | | | | |5|π|ο[οι]α οἱ orf ofze zfoe|ox| onler [zeloz ol val os 
οὐ] oo| ο0] 26] so[ οὐ νο] ευ τοὶ rol οὐ od ss ze ва eal val ва cal το] of ον] v rv ον] ον] val ον] ση] τν] ον) on] ϑε] τε] 98 


fo] [sf el ef lo [sao sa CECI vo] [eee] mes 
ean ee ef ov] |] s тн ж ia S C оваа 
fe epe mener ne Ы nr oe onere fon] nore [an eo Ше 
eo el fe ee oso OE IEEE ECC 
а ve| ве) ce] τε] os| ez| se| е sc wj το το τά ой οἱ ву zx] er| er νι] er ex] tr] or] e| e| ее +] εἰ e| τ 


I L#SHS ЛЕУ 


91 


MALLINSON - BUCK 


September, 1955) 


> ΡΝ... = a S νά οτι ο s 


ае етее ре рете OS 
э ааа иаа аниа нә [ln [fo EE Ns 
fo oleae elf alatele] aeaaee eo | 
mmm mmm ol ol 


сајәдә а Тааа та ере реферати CREE PS CO CS ЖЕК ЭН 
nr lov lene [nw ЕЕ ШЕЕ Шы 
JDDEDECDBDUDECEDEROOGADGBDD 
pekepeke FEELER 


T ри” 19 рә1о>с 5529] “αρα jo | Hed jo Asewwins 


II 4ЧЧН5 лите 


52 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


SUMIARY SHEET III 
-Злемяк  КЕВЕМТ5 EXAMINATION FOR ` Ушу 
Summary of Part ll of so. Tests 


Scored at 78. 


PP PE PM PR AvSc. 2 PP PE PM PR AvSc. Z 


> 


- е — >——s KÑIA UI... 
— — n Vn Hn 


o 


September, 1955) MALLINSON - BUCK 


TABLE I 


NUMBERS OF EXAMINATIONS RECEIVED 


Dates of sxaminations 


Field of — 
ota 

Science 
Biology 8612 
Chemistry 8014 
6587 


Earth Science 


Physics 


TABLE II 


COEFFICIENTS OF RELIABILITY FOR PARTIOF THE 
REGENTS EXAMINATIONS IN SCIENCE 


Earth 


Date of к 
Biology Chemis Uy. Science 


Examination 


^ 


Physics 


January 1949 


+ 
June 1949 «77 2.02 
.79 * «09 
January 1950 
тт? 200 


June 1950 


TABLE Ш 


ISTENCY BETWEEN PARTS I AND II 
IONS IN SCIENCE 


COEFFICIENTS OF CONS 
OF THE REGENTS ΕΧΑΜΙΝΑΤ' 


Barth 
Science 


Physics 


Date of Chemistry 


Biology 
Examination BEPIQES 


January 1949 


«85 Z «01 
June 1949 . 

.58 * .01 
January 1950 

«50 5 „01 


June 1950 


53 


54 JOURNAL OF EXPERIMENTAL EDUCATION 


lowing conclusions seem valid: 


1. Most of the coefficients of relia bility 
found in Table II (ten of sixteen) were 475. ог 
higher. These values are considerably higher 
than similar computations for teacher-made 
tests, at least insofar as the available research 
evidence indicates. The values, however, are 
somewhat lower than those usually found for co- 
efficients of reliability for standardized achieve- 
ment tests in science. 

2. It must be kept in mind that the coeffic- 
ients of reliability were computed for only the 
fifty points on Part I of the various tests rather 
than for the total one hundred points. Hadsome 
technique been available for including the points 
obtained on Part II, a higher degree of reliabil- 
ity might have been indicated. 

3. The Regents Examinations in Science are 
prepared for three examination periods during 
the year, whereas a standardized examination 
remains essentially the same from year to year 
except for occasional revisions. Thus any crit- 
icisms of the reliability must be tempered in 
the light of this fact. 

4. The coefficients of consistency in Table 
ІШ for total ranges of scores were not general- 
ly as high as the coefficients of reliability found 
in Table II. Thus there is no assurance that 
Part I and Part II have the same relative degree 
of difficulty for the students. 


Computing Coefficients of Validity 


Ordinarily the validity of a measuring instru- 
ment is determined by comparing the scores ob- 
tained on that instrument with criterion data 
for the factor being measured. In the case of 
this study only one measure for each student 
was available. Thus there was no possibility of 
comparing the scores the students obtained on 
the Regents Examinations in Science with the 
Scores they obtained on any other measure, 
Hence, it was decided to use an internal criter- 
ion. 

The various examinations were Submitted to 
at least five members of the National Associa- 
tion for Research in Science Teachi ng, who 
taught science education at the college or uni- 
versity level and who, at one time or another 
had worked on some phase of test construction. 
They were asked to identify on Part II for each 
of the examinations, an item or part of an item 

that could be considered defensibly to bea meas- 
ure of each of the following objectives: 


a. Ability to apply scientific principles 

b. Possession of scientific attitudes 

c. Ability to employ problem-solving skills 
(elements of scientific method) 


1f a majority of the specialists identified a 
certain item or part of an item, as being a 
measure of one of the objectives listed above, 
the scores on these items or parts thereof, were 
considered tentatively as being criterion data 
for these objectives. The items thus identified 
were then resubmitted to all the specialists who 
were asked to examine carefully the items (or 
parts) and to make their judgments as to wheth- 
er they could be considered defensibly as being 
measures of the objectives. Only those items 
considered suitable by four of the five judges 
were retained. In some cases the specialists 
failed to identify a suitable criterion item. In 
the case of these examinations computations 
for validity were omitted, 

Table IV lists the various items or parts ο: 
items that were identified as being suitable for 
the intended purpose. The scores obtained by 
the students on the total test were then plotted 
on the horizontal ordinate of the correlation 
chart and those obtained by the students on the 
criterion items were plotted on the vertical or- 
dinate. The computations for validity were then 
made and corrected in the same manner as those 
for consistency, Tables V, VI, and VII list the 
coefficients of validity thus computed. 


Conclusions 


Insofar as the techniques used in this phase 
of the investigation may be defensible, the fol- 
lowing conclusions seem valid: 


1. An examination of the Table VI indicates 
that for none of the examinations for Chemistry 
and Earth Science was an item or part of an it- 
em on the respective Part II considered to be a 
measure of the possession of scientific attitudes. 

2. Tables V through VII indicate that the co- 
efficients of validity for the various objectives 
differ greatly. In some cases they can be con- 
Sidered high, in other cases, low. Thusit is 
difficult to generalize with respect to the valid- 
ity of the different Regents Examinations in 
Science. 

3. In general, one might state that the Re- 
gents Examinations in Science are better meas- 
ures of the ability to apply scientific principles, 
than to use elements of scientific method. The 
data for scientific attitudes are not sufficiently 
extensive to warrant a conclusion. 


4. As compared with the validity of teacher- 
made tests that о: 


Science is, in general, hi, 


5. It should be kept in mind that the methods 


for computing the coefficients of validity are not 


(Vol. 24 


Sept 
ptember, 1955) MALLINSON - BUCK 


TABLE IV 


TEMS, OR PARTS OF ITEMS, USED AS 


NUMBERS OF I 
CRITERION DATA* 


Number of Item or part of Item 


Examination 
Bad Understanding of Possession of Ability to Use 
Date Scientific Soientific Elements of 
Principles Attitudes Scientific Method 


Biology 

Jan, 1949 Το 
June 1949 2a 
Jan, 1950 9 
June 1950 ще: 
Chemistry 

Jan. 1949 3 
June 1949 2 
Jan. 1950 Бар 
June 1950 4 
Earth Science 

Jan. 1949 8 
June 1949 8 

4 


Jan, 


d as being suitable 


e that no item was judge 


*The dashes indicat 
ion data, 


to serve as criter 


55 


56 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE V 


COEFFICIENTS OF VALIDITY (UNDERSTANDING OF 
SCIENTIFIC PRINCIPLE S) 


Biology Chemistry Earth Soience 


No Adequate 
Criterion 


Physics 


1949 


TABLE VI 
COEFFICIENTS OF VALIDITY (SCIENTIFIC ATTITUDES) 


cnt MN 
ETI 


Earth Science Physics 


No Adequate 


No Adequate 


No Adequate 
Criterion 


No Adequate 


Criterion Criterion Criterion 

June 

1949 «57 5 .02 No Adequate No Adequate +55 + .01 
Criterion Criterion 

January 

1950 «05 No Adequate No Adequate .58 5 .02 
Criterion Criterion 

June 

1950 


No Adequate 


No Adequate 
Criterion 


No Adequate 
Criterion 


Criterion 


TABLE VII 


«48 5 „01 


Jenuary No Adequate No Adequate 

1950 Criterion Criterion «61 5 „02 
June 

1950 615.01 «355.01 «49.5 «09 


(Vol. 24 


September, 1955) 


thi TE 
е ones ordinarily used. Hence, the above 


Conclusions “ 
of this fact. must be evaluated in terms 


SECTION IV 


AN 
ACHIDVSTIGATION OF THE RELATIVE 
^ QEMENTS OFMALES AND FEMALES 
Qu. URAL AND URBANSTUDENTS, 
E REGENTS EXAMINATIONS 
IN SCIENCE? 


The Problem 


T 
еы ROBLEM of this phase of the in- 
Ment on th is to determine whether achieve- 
(1) varies т Regents Examinations in Science 
з ee a the sex of the student, and (2) 
Udent Lh the size of the school in which the 
15 enrolled, 


varie 
St 


Methods Employed 


. In praet; 
tion two ыу еуегу 1еуе1 ої 5сїепсе ейиса- 
deals estions arise frequently. The first 


eals wi 
girls, eat .. relative achievements of boys and 
ents ο 
numbe Е rural and urban students. The vast 


1 rS š 
liedin το examination papers that were tal- 
e its. Enton, mate possible a study 


, The fj 
tion Waste ee in this phase of the investiga- 
9 pa bulate, for each separate bundle 


ber, 
8, the following information: 


The fi 
tee of science for which the exam- 
2. The ae prepared 
e fo i 5 қ 
3 Prepared г which the examination wa S 
t The ο 

which the and location of the schoolfrom 

4, Ceived е examination papers were ге” 


h 
year pP ulation of the school for the school 
pared. Which the examination was pre- 
the sch, (Note: It was decided to accept 
49 as ae enrollment for the year 1948- 
in most © Pase population, since it did not, 
that of 124565, differ sufficiently irom 


1949-50 to make a distinction. ) 


The š 
1 ет 
found on gor mation for the first three points was 
ng Was fou, examination papers. That for point 
engi чс: nd in the Forty-Sixth Annual Report 
ies Ὁ ime ae Department for the School Year 
You (Alba 30, 1949, volume 2, entitled Statis- 
ра Д Y: University of the State of Ne w 
York tions’ рр. 353). The information for the 
howe Was n of high schools in the City of New 
“Чер, Ot listed in this publication. It was, 
ined from the Supervisor of θεῖ 


е second, with the relative асһіеуе- 


MALLINSON - BUCK 57 


ence Education of the State Department of Edu- 
cation. 

The first step was to tabulate separately for 
males and females the scores made on Par tI, 
Part II and on the total examination. This task 
was relatively simple. 

However, it was more difficult to classify a 
school as being urban or rural in C haracter, 
Hence, rather than to tally scores according 
to this method of classification it was decided to 
tally them according to the size of the school in 
which the students were enrolled. For the pur- 
pose of this investigation the Southern Michigan 
classification for size of high school was used. 
As set up in the Handbook of the Michigan High 
School Athletic Association for the School Year 
of 1952-53 the classification is as follows: 


1, Class A - over 800 students 

2. Class B - 325 to 799 students 

3. Class C - 150 to 324 students 

4. Class D - less than 150 students 


This classification, however, did not prove 
to be completely satisfactory. It did not seem 
reasonable to classify the students of a school 
with an enrollment of 1000 with those from a 
large New York City high school, suchas James 
Madison School with a population of over 6000. 
Therefore an additional classification was add- 
ed, namely AA, or schools with a population of 
over 1500. 

A copy of the sheet on which the scores were 
tabulated is shown on the next page. 

In order to determine the significance of the 
ight exist among the scores оп 
the basis of sex and class of school, it was de- 
cided to use the analysis of variance technique 
with the double entry table described by Lind- 
quist. 5 However, there was à great inequality 
in replications since papers were sampled from 
those contributed by the various classes of 
schools in the same proportions as the various 
classes contributed papers to the total number 
sent into the state. Аз а result the factor of non- 
orthogonality was present in the design. The 
procedure for the final “Е” test was the one 
suggested by SnedecorÓ for use with two-way 
classifications with unequal replications inwhich 
corrections are made for non-orthogonality. 

A copy of the analysis sheet used for this pur- 


pose is shown on page 59. 
Table VIII presents the results of the compu- 


tations just described. 

It may be noted that in several cases the уаг- 
iance with respect to interaction are significant. 
These occur on Parts I and II, and total score of 
the Biology Examination for June 1950; on Parts 
тапа II of the Chemistry Examination for Janu- 
ary 1949; on Part П of the Chemistry Examina- 
tion for June 1949; and on Parts I and II, and 


(Vol. 24 


{ 


EXAMINATION 


Р 


TE 


А: 


POPULATIO 


D. 
1948-49 


TION 


JOURNAL OF EXPERIMENTAL EDUCATION 


1949-50 
---- — 8. 


NAME OF SCHOOL: 


N 


NAME OF EXAMINA’ 


58 


Total 


Part II 


Part I 


Student 


Total 


Part I | Part II 


Student 


September, 1955) MALLINSON - BUCK 


59 


ANALYSIS OF VARTANCE (Two-way classification) 


Part(s) 
Examination Date ( 


Class of 
School 


E = T2/N (Correction) = 

А-Е + (55т) 
РДЕ I. ---- 599 1 E (58061106) 
ie ELMAR --------- бе ЕЛ (55ң) 
G = зз mi. 15.5. и 
D- TAX(ool)/N = Ὃ.------------ 


Corrected 
sum of 
squares 


Preliminary 
sum of 
squares 


Source of 
variance 


Mean F 
square 


Class 
Interaction 


Sub-total 


Within 


60 


Examination | Part(s) 


Biology, 
January, 1949 


Biology, 
June 1949 


Biology, 
January 1950 


JOURNAL OF EXPERIMENTAL EDUCATION 


Т 


TI 


L & LI 


II 


I & TI 


I 


II 


I & II 


TABLE VIII 


ANALYSES OF VARIANCES 


Souroe 
of 
Variance 


вех 
class 
interaction 


sex 
class 
interaction 


Sex 
class 
interaction 


sex 
Class 
interaction 


sex 
Class 
interaction 


sex 
class 
interaction 


sex 
Class 
interaction 


sex 
Class 
interaction 


sex 
class 
interaction 


Interpretation 


Significance High 


Sig. 
sig. 
not sig. 


not sig. 
very sig. 
not sig. 


sig. 
very sig. 
not sig. 


not sig. 
very sig. 
not sig. 


sig. 
very sig. 
not sig. 


not sig. 


very sig. 
not sig. 


very sig. 
very sig. 
not sig. 


not sig. 
very sig. 
not sig. 


not sig. 
very sig. 
not вір, 


boys 


girls 


(Vol. 24 


— _ „ — 


September, 1955) 


Examination | Part(s) 


қ Biology, 
June 1950 I 
IT 
τὰ ІІ 
Chemistry, 
January 1949 I 
o 
II 
та II 
Chemistry, 
June 1949 I 
II 
T T 


MALLINSON - BUCK 


TABLE VIII (Continued) 


Source 
of 
Variance 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
intersection 


сір» 
very 
very 


515. 
very 
sige 


sig. 
very 
sige 


very 
very 
sige 


very 
very 
Бір. 


very 
very 


Interpretation 


Significance} High Low 


sig. 
sig. 


sige 
Sig. 


not sig. 


very 
very 


сір» 
сір. 


not sige 


Sig. 
very 
зір» 


516» 


not sige 


very 


Бір. 


not sige 


boys 
AA 


61 


62 


JOURNAL OF EXPERIMENTAL EDUCATION 


Examination |Part(s) 


Chemistry, 
January 1950 I 


II 
Tier Ja 
Chemistry, 
June 1950 I 
II 
I & II 
Earth 
Science, 
January 1949 I 
IT 
I * II 


TABLE VIII (Continued) 


Source 
of 


Variance 


sex 
Class 
interaction 


sex 
class 
interaction 


sex 
Class 
interaction 


sex 
class 
interaction 


sex 
Class 
interaction 


sex 
class 
interaction 


Sex 
class 
interactior 


sex 
class 
interaction 


sex 
class 
interaction 


жуы 


very sig. 
very sig. 
not sig, 


not sig. 
very sig. 
sig. 


Sig. 
very sig. 
not sig, 


very sig. 
siz. 
Sig. 


not sig. 
sig. 
sig. 


very sig. 
sig. 
Sir. 


not sig. 
very sig. 
not sig. 


not sig. 
not sig. 
not sig. 


not sig, 
not sig. 
not sig. 


boys 


( Vol. 24 


-— 


September, 1955) 


MALLINSON - BUCK 


TABLE VIII (Continued) 


Examination |Part(s) 


Earth 
Science, 
June 1949 I 
It 
I 5 II 
Earth 
Science, S 
January 1950 I 
ІТ 
τὰ TL 
Earth 
Science, 
June 1950 I 
II 
I & II 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaotion 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


gex 
class 
interaction 


Interpretation 


very sig. 
very 515. 
not sig. 


very 515. 
not sig. 
not Sig. 


very sig. 
sig. 
not sig. 


not sig. 
not sig. 
not sig. 


very sig. 
mot sig. 
not sig. 


not 515. 
not sig. 
not sig. 


not sig. 
not Бір» 
not sig. 


not sig. 
very Бір. 
not Sige 


not sige 
not sige 
not sige 


Significance | High 


Low 


64 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VIII (Continued) 


Source Interpretation 
Examination |Part(s) of 


Variance Significance High 


Physics, 

January 1949 sex very sig. girls 
class very sig. B 
interaction not sig. == 
вех not sig. = 
class sig. A 
interaction not sig. == 
ѕех very sig. girls 
class very sig. A 
interaction с not sig. -- 

Physics, 

dune 1949 sex not sig. == 
с1авв very sig. с 
interaction not sig. =5 
зех not sig. -- 
class sig. с 
interaction not sig. -- 
sex. not sig. == 
class very sig. с 
interaction not sig. ees. 

Physics, 

January 1950 sex very sig. girls 
class very sig. D 
interaction not sig. =" 
sex very sig. girls 
class sig. B 
interaction not sig. x 
Sex very sig. girls 
class very sig. D 
interaction not sig. =< 


(Vol. 24 


September, 1955) 


Physics, 
June 1950 


Examination | Part(s) 


MALLINSON - BUCK 


TABLE VIII (Continued) 


Interpretation 


Significance 


not sig. 
sige 
not sige 


Variance 


class 
interaction 


sex not sige 
class sig. 
not sig. 


interaction 


sex not Sig. 
class sige 
not sige 


interaction 


65 


66 JOURNAL OF EXPERIMENTAL EDUCATION 


total score of the Chemistry Examination for 
ne 1950. E 

2 In these cases it is possible that differences 

in curriculum, and the organization and admin- 

istration of the schools are the causes of observ- 

able variances, rather than the factors of size 

of school and sex. 

However, in the cases where the obseryed 
variances could be attributed reasonably to sex 
or size of school, the following observa tions 
seem defensible: 


1. On the various Biology Examinations š 
boys are significantly better than girls on two 
occasions, and girls are Significantly better 
than boys on two occasions. Students from AA 
schools prove to be superior on nine occasions, 
while students from class D schools prove tobe 
the lowest on five occasions, from class Aon 
three occasions, and from class B on one occa- 
sion. 

2. On the various Chemistry Examinations, 
boys are significantly better than girls on four 
occasions, while in no case did the girls prove 
to be significantly better than the boys. Stu- 
dents from the AA schools prove to be superior 
in five cases, while students fr om class B 
Schools are lowest in one case, from class C 
in two cases, and from class D in two cases, 

3. On the various Earth Science Examina- 
tions, boys are superior in three cases, while 
girls are superior in one, Students from the 
AA and A schools each prove to be superior in 
one case, and students from class B schools in 
two cases. Students from class C schools are 

lowest in two Cases, and those from class B 
Schools, once, 


4. On the Physics Examinations, boys are 
Superior in five с; 


girls superior, Students from the class AA 
Schools are Supe 


from class A schools 


5. It may be Stated, therefore, that in thirty- 
one out of forty-eight cases there is novariance 
attributable to the sex factor. However, out of 
the remaining Seventeen cases the varia nces 
are significantly in favor of the boys infourteen, 
and significantly in favor of the girls in three, 

6. It may be stated that in eighteen cases 
there are no variances attributable to size of 
school. In the remaining thirty cases the var- 
iances are Significantly in favor of students 
from class AA schools in tw enty -three, while 
in no case are they in favor of students from 
class C or D. In nineteen of thirty cases stu- 
dents from class C and D Schools appeared to 
exhibit less achievement, at least insofar as 
the variances may be criteria, 


(Vol. 24 


Іп conclusion, boys from the large high 
Schools appear to score Significantly higher on 
the Regents Examinations in Science than any 
other single group, while girls from small high 
Schools appear to exhibit less achievement than 
any other single group. 


SECTION V 


THE VOCABULARY LOAD AND LE VEL OF 
READING DIFFICULTY OF THE REGENTS 
EXAMINATIONS IN SCIENCE 


The Problem 


THE PROBLEM of this phase of the me 
vestigation is to evaluate the Regents Examina 
tions in Science with respect to their vocabulary 
loads and levels of reading difficulty. 


Methods Employed 


The first step was to find a technique for eval 
uating the vocabulary load of the Regents Exam 
inations in Science. The Flesch8 formula ine 
Well as other reading formulae) was obvious 1y 
not suitable for the intended purpose since it А 
used ordinarily with passages of at nee 
hundred words or more and involves comple d 
sentences rather than the type of material foun! 
on the Regents Examinations, Some of the ex 
aminations, for example, contain comple tio ae 
and multiple-choice items that do notadapt ther 
Selves readily to the use of the Flesch formu t 
Therefore it was decided to use the word-coun 
method. 

The first step was to tally all the words a 
appeared on the sixteenexaminations. The wor 8 
in the directions for writing the examination ae 
however were not tallied, nor were numbers us 
less they appeared as words. Empirical for Y 
las and structural formulas were not tallie t 
All other words, including those found on cha 
and diagrams were tallied, mad 

Next, the words thus tallied were αμα 
into two broad categories: (1) technical wor 2 
and (2) non-technical words. The technical ate 
gory was further sub-divided into two sa т 
cations: (а) essential, and (b) desirable. T > 
non-technical words were also divided into tW 
classifications: (a) difficult, and (b) easy-_ 
These categories and classifications were 68” 
tablished as follows: Letters were sent to ^n 
ty teachers who taught in each of the areas 0 3 
Biology, Chemistry, Earth Science and PhyS 
ics in the State of New York, asking if they 
Would be willing to evaluate lists of vocabulary 
Words in their respective teaching fields. 
Copy of this letter follows: 


September, 1955) 


October 8, 1952 


Dear 


пр 1 present time the University of the 
Search a ен York, through its science, re- 
an extens: statistical divisions, is undertaking 
Regents a investigation of the New YorkState 
facets of анада in Science. One of the 
ег or not "ая investigation is to determine wheth- 
tions e vocabulary load on the examina- 
sepa! be excessive. 
time oF ree to the University, you have atone 
а ministered er taught Earth Science* and have 
in that arei and scored Regents Examinations 
9f You, woul Hence, we have a requestto make 
OP terma fee you bewillig to eyaluate а list 
insure ον on four representative Regents 
Appreciat ns in Earth Science? If so, I would 
е receiving an indication of your will- 


Meness 
do the £ Опа postcard. If you agree, we shall 


E ollowing: 
Sether Me la a copy of the list of terms to- 
envelope 1 instructions and a self-addressed 
er Give. Which to return your evaluation. 
final repo оп full credit for your work in the 
trator ani Ë as well as informing your adminis- 
The job oard of education of your efforts. 

S completed ш take about one hour and should 
teria], W ed one week after receiving the ma- 
time to V9 Sincerely hope that you will have 

elp us out, 


Sincerely, 


George G. Mallinson, Director 
Evaluation Program for 


іш Science Regents 
(Νο 
te; 
“ерер дене area of science named in the letter 


Westeg See word list the individual was re- 
uate. ) 


quest, many teachers indicated 
Ss u y teac. 
thee D They were then sent à mimeo- 
S rind of all the terms that appeared on 
Ола in their respective teac hing 
"d Were asked to evaluate each word 
i Cording to instructions found inan 
ng letter, a copy of which follows: 


On th 2 
е 15 
NS x 
кн 


Our 
ree, 
Woulg p communication indicated that 


Vo 
Benge’ ry с Мпа to assist in evaluating the 
ation Exam; Ontent of the New York State Re- 
E 15 mo nations ip Physics. * Your Cooper- 
Closed © than appreciated. 
You will find a list of “Terms for 


а 


MALLINSON - BUCK 67 


Physics? together with a stamped envelope in 
which to return your evaluation. It is not neces- 
sary to sign your name. The following are the 
information and instructions for making the eval- 
uation: 

1. The list of terms consists of allthe words 
and terms that appeared on the four successive 
Regents Examinations in Physics for January 
1949, June 1949, January 1950 and June 1950. 

2. The terms may be divided into two cate- 
gories (a) non-technical, and (b) technical. 

(а) Non-technical terms are those that a per- 
son is likely to use at one time or another 
in his everyday conversation, or read in 
the newspaper or other literature not con- 
cerned specifically with physics. 

(ϐ) Technical terms are those that a student 
would encounter specifically in a course 
in Physics. While such terms might be 
encountered in other courses or other 
places, an adequate understanding of the 
usual topics and principles ofa typical 
course in physics would demand an under- 
standing of, and the ability to use and ap- 
ply, them. 

3. Please examine the list of terms one by 
one. If you think a term fits the definition of 
«technical term”, place an asterisk (*) before 
it. If you do not think the term fits the defini- 
tion, ignore it. Reexamine your list to see if 
your judgment is consistent. 

4. Then examine all the terms before which 
you placed an asterisk (*). If you believe that 
such a term is absolutel essentialtoanadequate 
understanding of topics and principles found in 
a typical course in physics, place a second as- 
terisk (**) before the term. If however you be- 
lieve the term to be merely a desirable techni- 
cal term, leave it marked with but one aster- 
isk (*). 

Again let me 
solutely essentia 
In the final repor 
efforts and your adminis 
ucation will be notified. 

Your evaluation will be appreciated as soon 
as convenient and a copy of the final report will 


be sent you if you so request. 


say that your cooperation is ab- 
land more than appreciated, 

+ due credit will be given your 

trator and board of ed- 


Sincerely, 


George G. Mallinson, Director 

Program of Evaluation 

New York State Regents Exam- 
inations in Science 


lm 

Enc. 2 

(*Note: The area of science named in the letter 
depended on the word list the individual agreed 
to evaluate. ) 


68 JOURNAL OF EXPERIMENTAL EDUCATION 


After a period of about four weeks, 23 Biol- 
ogy, 32 Chemistry, 24 Earth Science, and 24 
Physics lists were returned. These were then 
tallied on a ‘‘master tally list’’ for each of the 
subjects. Е 

If a word was checked ‘‘essential’’ by a to- 
tal of ten or more respondents it was consider- 
ed to be an “essential” term. (For example, 
the word ‘‘atomic’’ appearing оп the list of 
Physics terms was checked “desirable” by five 
Physics teachers and “essential” by eleven.) 
However, if a word was checked ‘‘essential’’ or 
**desirable'' by a total of five or more teachers 
(but checked ‘‘essential’’ by less than ten) it 
Was considered ‘‘desirable.** (For example the 
word “atmosphere”, appearing оп the list of 
Physics terms was checked **desirable" by sev- 
en Physics teachers, and “essential” by five. 
Therefore it was considered ‘desirable. ») 

All words rated as being **essential" or ««de- 
sirable’’ were considered to be part of the tech- 
nical vocabulary and hence were not deemed to 
be difficult. The remaining Words, not rated 
as being part of the technical vocabulary, were 
considered to be non-technical terms, andthere- 
fore words which a student might find difficult, 
These non-technical words were then checked 
by means of the Buckingham-Dolch9 word list 
in order to determine their grade-levels of dif- 
ficulty. It was assumed that the Courses in sci- 
ence would be taken by some Students at these 
grade levels: Biology, ninth grade; Earth Sci- 
ence, tenth grade; Chemistry, eleventh grade; 
and Physics, eleventh grade. Any non-techni- 
cal word was considered to be difficult the r e- 
fore if it was rated above these re Spective 
grade levels in the word list. 

Non-technical words not appearing in the 
Buckingham-Dolch list were also considered to 
be difficult. 

Table IX lists the 
various Re; 
fall into th 


numbers of words on the 
gents Examinations in Science that 
е various categories mentioned. 


Conclusions 
Ine'usions 


No listing will be made here of the different 
words falling into the various categories, How- 
ever, insofar as the techniques used in this 


study may be valid, the following Conclusions 
ғ seem justified: 


1. The greatest number 
(271) was found on the June 1950 examination 
in Biology, the fewest number (213) onthe June 
1949 examination in Physics. Thus the num- 
bers of technical words on the different exam- 
inations does not vary greatly. Further itdoes 
not seem likely that the vocabulary load with 
respect to technical words is likely to be ex- 
cessive. 


of technical words 


(Vol. 24 


2. The greatest number of difficult non-tech- 
nical words (8) was found on the June 1950 ex- 
amination in Biology, while there were no diffi- d 
cult non-technical words on the Chemistry exam 
inations for January and June 1949. Hence it is 
rather unlikely that the numbers of difficult non- 
technical words are excessive. 

3. The findings just indicated fail to show that 
there is any justification for criticizing the E 
gents Examinations in Science on the basis 0 
their vocabulary loads and hence their levels of | 
reading difficulty. 


SECTION VI 


ERRORS AND INCONSISTENCIES IN SCORING 
THE REGENTS EXAMINATIONS INSCIENCE 


The Problem 


THE QUESTIONNAIRE that was sent to 
the science teachers in the initial stages of this 
investigation revealed that about two-thir 18:07 
them believed that a sampling of the exami ne 
tion papers should be rechecked after they Lem 
turned in to the State. This would seem to oe 
cate that the teachers believed that there mig зе 
be some errors and inconsistencies in Ше end 
ing of the examinations. Hence, itis the pro E 
lem of this phase of the investigation (1) to he 
termine whether or not the belief is valid; andi 
So, (2) to determine the types and frequencies _ 
of scoring errors that appear in the papers an: 
lyzed in this study. 


Methods Employed 


While tallying the scores that the students Te 
Ceived on the various items on the examinatio® š 
the investigators recorded the obvious er nee 
and inconsistencies that appeared in scori ДЫ 
them. The resulting lists were then studi ач 
In general, it was found that the errors and i 


А да f ur 
consistencies could be classified into these £0 
major categories: 


1. Errors in the addition of points of credit 
2. Errors resulting from failure to £0110 

State-prescribed scoring procedures | 4 
3. Inconsistencies and errors in correctio 
4. Miscellaneous 


Findings 


1. Errors in the Addition of Points. — id 
S nn the Addition of Points 
category of error and inconsistency was E 68 
the most extensive. There were severals 8 


i ñ in 
in the scoring of a paper where such errors 
addition could occur, namely, 


a) Errors in adding the Part I and Part 17 


8 
eptember, 1955) MALLINSON - BUCK 


TABLE IX 
ERENT CLASSIFICATIONS 


NUMBERS OF WORDS BELONGING IN DIFF 


umber of Different 
Non-Technical Words 
Total 
Non- 
d Technica. 


Different 


Field 
of Words 


Science 


Earth 


Science 496 


69 


70 JOURNAL OF EXPERIMENTAL EDUCATION 


scores to obtain the total test score. 

b) Errors in totaling the PartIscore. There 
were two major chances for error here; 
a mistake could be made (1) in totaling 
the number of points to be deducted be- 
cause of a student's failure to answer it- 
ems correctly; and (2) in subtracting this 
number of points from the maximum point 
value of fifty for Part I. 

c) Errors in totaling the Part II score. In 
this case there were several ways in 
which the errors could occur. At times 
simple errors could be made in totaling 
the scores on the parts of items. Atother 
times errors in subtraction resulted when 
the points to be deducted because of error 
were totaled and subtracted from ten (the 
maximum point value for each individual 
Part П item). In still other cases the 
correction marks of the teacher were so 
light or illegible that they were apparent- 
ly overlooked when the points were to- 
taled. (This latter type of error would, 
of course, have been avoided if cumula- 
tive scores had been kept, as suggested 

by the State. ) 


Certain other errors were made in totaling 
Part П scores because of the failure to follow 
correct scoring procedures, Such cases will 
be discussed in the next two parts of this sec- 
tion. 

It is interesting to note that the greatest 
number of errors in making totals accrued to 
the benefit of the student, that is, the total 
Score awarded the paper was higher than the 
Correct total, For example, out of 2011 
Biology Examinations for January 1949 ; 
ninety-two errors in making totals were detect- 
ed. Of these, only Six scores were lower than 


higher. 


of science, 
2. Errors Resultin from Failure to Follow 


State-Prescribed Scori Procedures. —The 
State of New York issues а manuallU listing the 
procedures to be followed in scoring Regents 
Examinations. Many of thése Suggested proced. 
ures were violated by a number of teache rs, 
and these violations in many instances led to 
incorrect examination scores, Examples of 
these types of errors follow: 

a) Failure to keep a cumulative score. The 
State suggests that for each item or part of an 
item the points awarded should be indicated on 
the test paper and a cumulative positive score 


(vol. 24 


be kept throughout the paper This means that 
the points awarded for answering correc tly 
each item or part of an item be totaled continu- E 
ously as the paper is scored. Hence at the com 
pletion of the last item the resulting cumulative 
Score will represent the total score of the paper. 

By far the greatest number of teachers ta ες 
lied the points deducted rather than the points. 
awarded for the answers to items or parts of it- 
ems. Among those who did indicate the points 
awarded, many failed to keep cumulative 
Scores. Obviously the practice of indicating the 
number of points deducted is more subject toer- 
ror than the method recommended by the State. 
The teacher may make a mistake in totaling the 
points to be deducted and make another in sub- 
tracting this total from the maximum point val- 
ue allotted the item or partofanitem. For ex- 
ample, such errors occurred on 117 out of 1699 
papers in Earth Science for June 1949. 

b) Failure to score items in sequence. Оп. 
Part Ἡ of the examinations a student has the ΟΡ 
tion of selecting five out of a possible eight or 
nine items. On some of these items he may i 
omit one or two parts of the item. Occasionally 
all eight or nine items or allparts of a single i 
em were answered by a student. Incertain cases 
if one item (or part of an item) that appeared dm 
the middle was answered poorly or incorrect x 
Some teachers skipped it and gave credit for EX 
later sections that wére answered more correc 
ly. " 

The State requires that the items or parts © 
items should be scored in order of appearance; 
omitting the last item or part, Failure to do S0; 
of course, may give a student a higher score 
than he deserves. t 

с) Scoring of papers by several differen 
teachers. The State suggests that one а М 
should score all items on any given eria 
Paper. However, in many cases, particular 4 
in larger schools, it was obvious that REE 
worked together on scoring a group of pape а ; 
For example, one teacher might score ite jen 
one through ten on a group of papers; anot Wor 
teacher items eleyen through twenty. Sucha p d 
cedure often lead to inconsistencies and error: 
when the total score of a paper computed. rhe 

d) Counting scores below 62 as passing. tby 
cutting score for passing papers has been 56 nus 
the State at 65, However, recognizing that € 
rors may occur in the correction of papers, 7 
three percent correction error has been allowed: 
Thus scores from 62 to 64 are considered 2S 
“passing”. These “below level” scores ES , 
63 and 64 аге recorded as In seve Hoe 
Cases, however, scores of 61, 60 and even 
and 58 were recorded as * ее 

In addition, scores that should have been É 
corded as were sometimes recorded as 69- 
This occurréd on twenty-four papers onthe Bi 


Septe 
mber, 1955) MALLINSON - BUCK πι 


TABLE X 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN BIOLOGY FOR 


JUNE, 1949 
Е тота Percentage 
Papers | Errors об 025 

82 6 9.74 
21.0% 83 15.6% 
19.0% 84 12. 3% 
13.0% 85 16.0% 
12.5% 86 13.0% 
11.0% 87 20.0% 
10.0% 88 7.97 
16.7% 89 14.3% 
19.6% 90 9.5% 
22.0% 91 5.3% 
14.8% 92 21.0% 
52.0% 93 15.6% 
ds 94 9.5% 
14.37 55 Ст 
ae 96 16.7% 
Ag: 97 15.4% 

12.2% 98 % 

10.6% 99 y 

0 


100 


72 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XI 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN CHEMISTRY FOR 
JANUARY, 1950 


Number 
Percentage Percent. 
of Score| No. of ада 
Errors | ОҒ Errors B of Errors 


82 


(Vol. 24 


Septembe 
r, 1955) | MALLINSON - BUCK 


TABLE ΧΠ 


ERRORS IN THE ADDITION OF 


NUMBERS AND PERCENTAGES OF 
INATIONS IN EARTHSCIENCE 


POINTS ON THE REGENTS EXAM 


FOR JANUARY, 1950 


Total | number Total | Number 
Score |No. of of Percentage of Percentage 
Papers | Errors of Errors Errors ος περα 


15.9% 13.6% 
24.0% 10.0% 
15.0% 14.6% 
15.2% 13.9% 
15.7% T:s% 
12.9% 3.4% 
10.7% 13.3% 
18.0% 557% 
37.5% 6.42 
4.1% 
7.1% 
6.7% 
22.2% 

0 
14.3% 


50.0% 


73 


14 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XIII 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN PHYSICS FOR 
JANUARY, 1950 


Total 
Score| No. of 


Percentage 
Papers 


of Errors 


of Errors 


15.3% 
6.9% 
4.7% 
3.3% 
6.4% 
2.27 
2.14 
0 
4.24 
2.87 
0 


0 


O O O F mn о = P> » wu во 


o 


(Vol. 24 


September, 1 955) 


ol я 
Tu ipe pen for January 1950. 
In some ш ing scores of 72, 73, and 74 as @). 
Offer пра such аз in schools that до not 
Courses ы work; and in “short -ter m” 
‘Wired for phis. classes) a score of 75 is re- 
teachers c passing grade. In many cases 
to countin, ounted 72, 73 or 74 as (75), similar 
egal proi 5 62-64 as While this is not a 
past ше, the State has accepted it in 
Within ee such scores as falling 
Benno. Go ee similar to the 62- 
Which 75 wa wever, ina number of cases іп 
teachers sane required for passing, the 
е Simma ni Inconsistencies in Correction. 
ng er Iles made by the investigators of 
that could PS S revealed several types of errors 
or inconsiste classified as being actual mistakes 
amples encies in scoring the papers. EX- 
a) Givi of such errors follow: 
Should ως credit for parts of an item that 
lous]y пи been rejected. As indicated prev- 
Sight parts, у, Part II items contained seven ΟΥ 
? or six: of which the student was to answer 
2 Scorer SD omit the remainder. Sometimes 
ап item did fail to note that all parts of such 
е Credit fi een answered by a student, would 
han the η all parts, and henceaward more 
е е stude imum allowable number of points 
τος ον -two nt. Such an error occurred in 
°F January 1950, 2441 Biology Examinations 
Givi d 
dave Biens ολη for an entire item that should 
EEG Scourred va Anerror similar to (a) 
Occ ting DE Ὁ when a. student had the optionof 
ет Sionally ие ыы entire items in Part 11. 
ще? and agai student would answer all the it- 
> givin in the scorer would correct all of 
emo ai чо the student extra points. 

Vers Ог parts i note the omission of entire it- 
en Of case of items. A condition just the ге” 
acher dia. (a) and (b) above occurred when 
type n Entire se notice that a student had omit- 

ЕД eno, conr Or рагеоращ ὃς This 
(ratis Scorj Occurred quite easily if the teach- 
Points” = the paper by deducting points 
The t Ημ the number of positive 
Point, acher wo? as suggested by the State у 
Dumb, educted an simply add the number of 
Note ае Of poi and subtract from the maximum 
by Не ont tS allotted the item, failing to 
Voula Dositive од. Had the paper been scored 
Уе-роіпі method, such an error 
pe! Errors of this type 
и d) potence enty out of 1699 papers in 
ек) Failure ног June 1949. 
Авг, “abated correct an item or part 
“ase tire ite, mally a teacher would over 
às Hess stude 9r part of an item. I such a 
deserved did not receive as high ascore 
- This type of error was detect- 


Scorj 


arth Sung 


of an 
look 


MALLINSON - BUCK 


75 


ed on fifty-nine out of 2441 Biology Examinations 
for January 1950. 

e) Errors in awarding points. The items on 
Part II of the examinations consist offrom two 
to eight or nine parts. The maximum point 
value of the items is in each case ten. How- 
ever, the values of the parts vary from item to 
item, depending on the number of parts inthe it- 
em or the complexity of the part. For example, 
one item might consist of three parts of values 
five, three, and two respectively; while an- 
other item might consist of five parts of two val- 
ues each. In many cases the teachers awarded 
incorrect numbers of points for items, namely, 
parts were given three points credit, when the 
maximum value was only two. 

The opposite situation, the awarding of too 
few points, was difficult to detect. Howe ver, 
one bundle of sixty Physics papers was tallied 
on which the teacher did not give full cre dit 
(three points) in any case. It is difficult to be- 
lieve that all sixty students failed to answer the 
item correctly. Hence it seems reasonable to 
assume that the teacher thought that the maxi- 
mum value of the item was two, rather than 
three. 

8) Inconsiste 


tencies in scoring were 
the work of à single teacher, but also between 


the scoring procedures of different teachers. 
An answer that would receive full creditfrom 
one teacher might receive only partial credit or 
no credit from another. Such situations were 
expecially obvious on items that required draw- 
ings or diagrams. Often one teacher would ap- 
parently give more credit for neat, artistic work, 
while another would consider only the scientific 
accuracy of the drawing. 

Similar inconsistencies occurred even with- 
in the work of one teacher. For example, one 
physics Examination included an item involving 
unites of electricity. One teacher gave credit 
for an answer of 2520 watts, but consistently 
marked as incorrect answers of 2.52 kilowatts. 
Since the students were given no instructions in 
the item as to the units to be used, it would ap- 
pear that both answers should be consider ed 
equally correct. 

g) Obvious errors in scoring. In many cases 
errors were detected (1) where a corre ct an- 
swer was consistently scored by a teacher as 
being wrong, and (2) where obviously incorrect 
answers were marked correct. It may be ав- 
sumed that in such cases the teacher simply 
did not know the correct answer to the item. 

4, Miscellaneous Errors in Scoring. —Cer- 
tain types of errors appeared that were difficult 
to classify under any of the previous categories. 
Hence they were grouped under this heading. 


They are as follows: 
a) Errors in transposition. In most cases 


ncies in scoring. Many inconsis- 
noted, not only wit hin 


76 JOURNAL OF EXPERIMENTAL EDUCATION 


the examination papers of the individual stu- 
dents are stapled to a cover page on which the 
teacher lists the total Part I score, thescores 
awarded on the individual Part II items, and 
the Part II total score. These are then totaled 
on this cover sheet to show the student’s fina] 
Score on the entire examination. 

Many times errors were made in the trans- 
fer of these partial scores to the cover page. 
For example, such errors occurred on 118 out 
of 1974 papers for the Chemistry Examination 
for June 1949, 

It is difficult to determine whether these er- 
rors are due to the carelessness of the scorer, 
or whether they are intentional. However, it 
is interesting to note that most of these errors 
accrued to the benefit of the student, that is, the 
Score on the cover was higher than the actual 
Score. Occasionally, however, a scorer would 
fail to record the score obtained on an entire it- 
em on Part II. Thus the student receiveda low- 
er score than he earned. 

b) Obvious ‘‘upgrading”’ of scores. In some 
cases it was quite evident that the Scorer had 
remarked a paper to give a student a higher 
grade than he earned or had simply changed the 
total grade to bring it up to a passing score, 
Such cases are difficult to construe as anything 
but dishonesty on the part of the Scorer. For 
example, twenty-two such papers were identi- 
fied in the Chemistry Examination for June 1949. 

ο) Failure to grade a paper completely, A 
situation somewhat related to (b) was anobvious- 
ly dishonest practice that was detected on a few 
Occasions. In several instances papers were 
found in which the scorer had corrected Part I 
of the paper only. The scorer then simply 
credited the paper with a sufficient number of 


points on Part II to bring the total up to the pass- 
ing level. In some cases the 


Recommendations 
sien menda tions 


AS à result of these findings the following 
recommendations seem reasonable: 


1. It is recommended that the State of New 
York provide a more Specific list of instructions 
for the Scoring procedures to be followedincor- 
recting the Regents Examinations in Science, 

2. It is recommended that a Scoring key be 
provided for Part II of the examinations, as 
well as for Part I. It is realized that because 
of the nature of the Part II items, itis diffi- 
cult to construct an absolute scoring key. How- 
ever, it would seem desirable to pr eparea 


(Vol. 24 


“scoring guide" that would suggest point values 
for whole or partial answers. 

3. It is recommended strongly that the State 
continue to spot-check the examination papers 
in an attempt to identify scoring errors and in- 
consistencies, and that, further, they notify 
the science supervisor or administrators in the 
Schools in which the errors most frequently ост 
cur of their type and extent. This may reduce 
the appearance of these errors on future exam- 
inations. 


SECTION VII 


AN ANALYSIS OF THE SCORES OBTAINED ON 
THE TEST ITEMS ON THE REGENTS EX- 
AMINATIONS IN SCIENCE 


The Problem 


THE PROBLEM о this phase of the Іп” 
vestigation is to analyze the individual items ОП 
the sixteen Regents Examinations in Science tha 
were studied, with respect to these points: 


1. The degree of difficulty of the various 
types of items 

2. The discriminating power of the items 

3. The popularity of certain items 


Methods Emploved 


In order to determine the degree of difficulty 
and the discriminating power of the individual? 
ems of the examinations, the average or P se t- 
centage score obtained on each item was eon 
ed as described below. For Parts I of the tes 
(each of which are composed of fifty short- e 
answer type items of unitary value) the атаар 
Score for each item was determined for each id- 
the total score groups (9 through 100) by € 
ing the number of student$ answering the к iv^ 
correctly by the total number of students rece 
ing the respective score, m^ 

AS has been stated, Parts II of all the exam 
inations are composed of eight or nine eS mee 
type items each of which bears a total value ct 
ten points. Of these, the student may 5 ο 
any five. Each essay item consists of fro oat” 
two to ten parts of varying point values. To the 
culate the percentage scores on these parts, nts 
total number of points earned by all the ann 
answering the part was divided by the maxim 
number of points that they could have obtaine 6 
had they all answered it correctly. This ша” 
done for each item for each total score gr? 
from (63) through 100. 


September, 1955) 


De 
rees of Difficulty of the Items 


T 
eee and percentage scores thus ор- 
of difficulty analyzed to determine the degree 
examination, each item on each of the sixteen 
Ог percentage It is obvious that if the average 
ly low for "UA Score of any item was consistent- 
must be ο the total score groups, the item 
consistent] Icult. Conversely, if the score was 
Items wer y high, the item must be easy. The 
“easy,” aoe arbitrarily as being 
On the bagi Average difficulty, ” or “difficult” 
Proximate] of the following criteria: (1) if ap- 
Percentage , One-half or more of the average ΟΥ 
ав Dic aeea is were above . 90, the item 
Rid or ы easy; (2) if approximately опе- 
thoevidered е, below .50, the item was 
‘hese range ifficult; and (3) those falling between 
“пошу, S were considered to be of average 
масив Це 
pA Man ctc thus categorized were then stud- 
Уре Seemed : to determine whether any One 
° thay ез abi 2 consistently easy or difficult. 
t iIncludeg Bons analysis, a list was ma de 
ap ining the м general subject-matter areas 
"E e rgest numbers of difficult items 
e 
*Amples of each type are cited below: 


Difficult Items 


L5 
ology ° 
. Pla 
1. «γριὰ Animal Phyiology 
ir ыт two adaptations ofa root 
На help it to perform its func" 
?” (January 1950, Part II, i ag) 


Ee 

върни а plant tissue cell. State its 

ρα, function and describe how it is 

1950 to perform this function. '„ (June 
, Part II, Fi b) 


. кер 
ж, mouth waters when food is pres" 
Stim ecause salivary glands have been 
τ. M by neurons. " (Jan- 
- 949, Part I, 34) 
η ыы 
M “το and Heredity 
are Чү Species involved, mutations 
ful T always harmful, (2) always use- 
азе 151811 harmful, (4) us ually 
." (January 1950, Part I, 19) 


Же: explanation for the following 
marri tement(s); In some cases the 
Very А of first cousins results in 
Ses esirable offspring; in other 
Sult > Very undesirable offspring Te” 
к Ка (January 1950, Part m, 339 
nimal has four chromosomes 1! 


еас 
h body cell. State the number 0 


Xàminati : 
nations in each of the science fields. 


T4 


MALLINSON - BUCK 


chromosomes іп (1)a primary egg cell." 
(January 1950, Part П, 5 bi) 


C. General Terminology 
i. “Те process of boiling milk to kill all 


Ë 
bacteria is sterilization. "' (January, 
1950, Part I, 50) 


2. «The part of the seed that will develop 
into the plant is the „2 еп“ 
uary 1949, Part I, 38) 


3. “An example of an antibiotic is sulfadi- 
azine. ” (June 1950, Part I, 25) 
There were also moderate numbers of diffi- 
cult items in the general categories of compar- 
ative anatomy, pio-chemistry, and history of 


biology. 


I. Chemistry 
A. Organic Chemistry 
1. “Нага coal consists chiefly of (1) carbo- 


hy drates, (2) combined carbon (3) un- 
combined carbon, (4) hydrocarbons." 


(January 1950, Part I, 17) 


od of making methyl 


2. «Describe one meth 
1950, Part II, 6 e1) 


alcohol." (January 


3. “Write the structural (graphic) form- 
ula for (1) chloroform, (2) ethylene.” 


(June 1949, Part II, 6 b) 


B. Atomic weights 
1. «Тһе weight of 22. 4 liters of hydrogen 
is approximately (1) 0.09 grams (2) 2 
grams, (3) 1 gram, (4) 22.4 grams. ú 
(January 1949, Part I, 33) 


9, "The weight of nitrogen compared with 
an equal volume of air is approximate- 
ly (1) one-half as great, (2) the same, 
(3) twice a5 great, (4) fourteen times as 
great. ” (June 1950, part I, 47) 


C. Commercial Reactions 
5 «Charcoal is а product of the process 


that also produces (1) acetic acid (2) 
coal tar, (3) coke, (4) gasoline. " (Jan- 
uary 1950, part I, 16) 


9. «The reaction of carbon monoxide and 
hydrogen 1$ used commercially to make 
(1) carbonic acid, (2) chlorine, (3) meth- 
anol, (4) soap. " (June 1950, PartI, 27) 


3. «Name two products which are obtained 
from coal tar. ° “State one use for 
each product mentioned in с.” (June 
1049, Part II, -6 c, d) - 


78 


4. 


. 24 
JOURNAL OF EXPERIMENTAL EDUCATION (Vol 


“What substance may be treated with 
chlorine to manufacture bleaching 
powder?” (June 1950, Part II, 3 e) 


D. Laboratory Procedures and Techniques 


1. 


“If too much air is allowed in the fuel 
mixture, the Bunsen flame will (1) be- 
come colorless, (2) become yellow, 
(3) deposit soot, (4) strike back. °’ 
(June 1949, Part I, 28) 


. “Give the reagents used in the labora- 


tory preparation of (1) nitric acid (2) 
ammonia. ” (June 1949, PartII, 8 d) 


. “State briefly how to prepare hydro- 


gen from water and sodium chloride. ’» 
(June 1950, Part II, 5 до) 


Е. Equations 


E 


Other difficult items 


“Write a completely balanced equation 
for the reaction between copper and 
hot concentrated sulfuric acid. ” (June 
1950, Part II, 1 e) 


. “Write an ionic equation to Show what 


happens when an oxygen ion is convert- 


ed to an O, atom. ” (June 1949, Part II, 
4c) 


include those involving 


terminology, characteristics of elements and 


compounds, and every 


istry. 


day applications of chem- 


III. Earth Science 


A. Geology 


la 
1 


- “Explain how weathered 


(this category included by far the 
rgest number of difficult items) 


. “Headwater erosion of a valley glacier 


results in the formation of a (an) 
." (January 1950, Part I, 
2 


. “Explain the following true s tate- 


ment(s): The Catskill Mountains are 
Classified as a plateau region, ” (Jan- 
uary 1950, Part II, 5 d) 


rock may 
again become bedrock. ” (January 
1949, Part I, 1 d) 


. “An intrusion of igneous rock that cuts 


across the rock layers is called a (1 


dike, (2) fault, (3) laccolith, (4) sill.” 
(June 1950, Part I, 31) 


B. Weather 


1. 


‘Distinguish between absolute and re]- 
ative humidity, ’; "Explain why rela- 
tive humidity decreases as tempera- 


ture increases. " (January 1949, Part 
II, 2 c, d) 


B e " in 
. “Air descending the side of a mountai 


becomes compressed. Why does this _ 
make the air comparatively dry?” (Jan 
uary 1950, Part II, 6 c) 


. “Barometric pressure recorded on а 


weather-bureau station model as 247 
would be read (1) 924.7, (2) 1002. 47, 
(3) 1024. 7, (4) 1247 millibars.'" (June 
1950, PartI, 23) 


C. Astronomy 


is 


“Тһе planet which is about the same Е 
Size as the earth is (1) Mars, (2) Ven 
us, (3) Mercury, (4) Uranus. " (June 
1949, PartI, 29) 


**Explain the following: 

D NEW York State. the altitude of the 
noon sun is higher during the summer 
than it is during the winter. " (January 
1949, Part II, 5 b) 


IV. Physics 
Á. Sound 


L. 


¿“State two conditions under which e 
Sound waves of the same amplitude ae 
produce complete interference. ’ es 
чагу 1950, Part II, 6 c; and June 1999. 
Part II, 4 d) 


hrat- 
. “Тһе note produced by a string vibrat, 


e. 
ing as a whole is called a (an) overtone 
(January 1949, Part I, 38) 


in 
. “Find the fundamental frequency i 


VpS. of a note produced by a piel 
Closed at one end, if the length of soo 
air column is six inches. Air ND 
ature is 20 degrees, C." (June 

Part II, 4 c) 


B. Electricity 


L, 


3. 


a 
“During the discharging μον 
lead Storage cell, the amount of 1949; 
in the cell . (June 
Part I, 44) 


š ith 
. “Ап electric heater has two coils W 


resistances of 40 ohms and 60 Ө 3% 
The heater operates оп а 120-vol Hat 
cuit, It is equipped with a switch saad? 
allows either coil to operate in ser rat- 
“Іп which of the three possible pce 
ing circuits is the heat developed 44) 
greatest?" (January 1950, PartII, 


А ° с е 
“Ап iron wire has more resistan 


September, 1955) 


than a copper wire of the same dimen- 
Sions, and an aluminum wire has more 
resistance than the copper wire of the 
Same dimensions. Compare the cur- 
rent in the three wires and state in 
which wire the most heat is generated 
when they are connected to a battery 
(1) in series; (2) in parallel. " (June 
1950, Part II, 5 b) 


C. ος and Mirrors 
) aoe image of an object viewed 
Seo a concave lens is always erect 
1949 ος than the object." (June 
, Part I, 26) 


i h - woman sees a full-length image of 
η in an upright plane mirror. 
(1). minimum length of the mirror is 
(3) exactly the same as, (2) one-half, 

twice, (4) independent of the height 


4 f S woman." (January 1950, Part 


Oth | 
Oceurred Лр items that seemed difficult 
It is inten? areas of heat and mechanics. 
О items cam Sting to note that the total number 
RUmbere tiegorized as being ‘‘easy’’ far О ut- 
This, of c hose categorized as difficult." 
ча the μμ 18 explained partly by the fact 
ae SCore minations analyzed all received pas- 
Ката} 5 between and 100. Hence, the 


AMination ; 
Chief), 208 items would naturally consist 


с ο i 
entage s6 a receiving high average and рег- 
Е. Easy Items 
Biology 
` Cons 


1 cet vation 
to κ οι plowing is done in an effort 
wend beautify the farm, (2) control 
δαν P (3) discourage insects, (4) 
6) е topsoil." (January 1949, part I, 


" alain the relationship of forests to 
(oy of the following: (1) flood control, 
Prevention of erosion, (3) preser- 
On of wildlife." (January 1950, 
EN 19 


` Plan 
" tbs Animal Physiology 
Creteq 10818 the body wastes are ex- 
evi by the lungs, skin and (1) kid- 
(4) st (2) p ancreas, (3) smallintestne, 
21) Omach, ’? (January 1949, part I, 


9 “Th 
that; type of cell in the bloodstream 


increases in number in response 


2 


MALLINSON - BUCK 79 


to the invasion of bacteria is the 
.? (June 1950, Part I, 36) 


3. «State four life functions carriedon by 
a maple tree. " (January 1949, Part П, 


1 a) 


C. Reproduction and Genetics 
1. “Ап important function of the sperm 
cell is to supply the egg with (1) a set 
of genes, (2) extra cytoplasm (3) extra 
food, (4) important hormones. » (Janu- 
ary 1949, Part I, 7) 


2. «The union of two unlike sex cells is 
called (1) fertilization, (2) maturation, 
(3) parthenogenesis, (4) vegetative prop- 
agation." (January 1950, Part 13) 


3. “Using a keyed and labelled diagram , 
show the cross between long and long 
radishes.” (January 1950, Part II, 9 


c1) 


р. Everyday Applications of Biology 
1. “И is now possible to keep an area 
quite free from flies by the use of (1) 
2-4D, (2) DDT, (3) Streptomycin, (4) 
sulfa drugs. " (January 1949, PartI, 5) 


2. ‘State the principal health purpose of 
each of two of the following: chest x- 
rays; Wassermann test; pasteurization 

(January 1949, Part I, 6 c) 


of milk. "' 
П. Chemistry | | 
A. Chemical and Physical Properties of El- 


ements and Compounds 
1. «Hydrogen sulfide is most easily τες” 


ognized by its (1) color, (2) density, 
(3) odor, (4) state. "' (January 1949, 


part I, 1) 


, “The lightest of the following gases is: 
(1) NH3, (2) NO, (3) N20, (4) NO;. ” 
(June 1949, Part I, 1) 


B. Chemical Reactions 
1. “Тһе solution resulting from the геас- 


tion between sodium and water contains 
(1) an acid, (2) an anhydride, (3)a base, 
(4) a salt. "' (January 1949, Part I, 10) 


9. «The reaction of a carbonate with an 
acid yields (1) carbon dioxide, (2) car- 
bon monoxide, (3) hydrogen, (4) oxy- 
gen." (June 1949, Part I, 16) 


4. “Give three reasons why а chemical 
reaction may go to completion. ” (Jan- 
uary 1950, Part II, 4 d) 


JOURNAL OF EXPERIMENTAL EDUCATION 


C. Everyday Applications of Chemistry - 


1. 


“Тһе growth of a legume, suchas clo- 
ver, adds to the soil a compound of 
(i) nitrogen, (2) phosphorous, (3) po- 
tassium, (4) sulfur." (January 1949, 
Part I, 22) 


. “Goiter may be caused by a diet defic- 


ient in (1) bromine, (2) chlorine, (3) 
flourine, (4) iodine. " (January 1950, 
Part I, 25) 


D. Laboratory Procedures 


1. 


“То prepare bromine inthe laboratory, 
add sulfuric acid to (1) NaBr, (2) NaBr 
апа ΜπΟΣ, (3) NaCl and Na Br, (4) 
MnBr;."' (June 1949, Part I, 37) 


. “А catalyst used in a preparation of 


oxygen is (1) manganese dioxide, (2) 
mercuric oxide, (3) potassium chlo- 
rate, (4) potassium chloride. ’’ (Jan- 
чагу 1950, Part I, 33) 


- “Draw a diagram of the apparatus 


used in preparing and collecting am - 
monia in the laboratory." (January 
1949, Part II, 5 c) 


Other easy items in Chemistry included 
those involving the writing of balanced e qua- 
tions, and the knowledge of symbols and form- 


шае. 


ПІ. Earth Science 
A. Geology 
1. ‘Physical and chemical action on ex- 


posed rock surfaces by atmospheric 
agencies is called (1) erosion, (2) cor- 
rosion, (3) suspension, (4) weathering.” 
(June 1949, Part I, 23) 


- “The breaking of minerals in such a 


way that smooth plane surfaces a re 
produced is known as (1) cleavage, (2) 
fracture, (3) luster, (4) streak. " (Jan- 
чагу 1949, Part I, 17) 


. “Тһе peeling or Splitting-off of outer 


layers of rock due to temperature 
changes is called (1) cleavage, (2) ex- 
foliation, (3) faulting, (4) fracture, " 
(June 1950, Part I, 35) 


B. Weather 


b 


2. 


**Closely spaced isobars on a weather 
map indicate winds." (Jan- 
uary 1949, Part I, 8) 


“When the air is completely saturated 
with moisture, the relative humid- 


( Vol. 24 


ity is zero 75." (June 1949, PartI, 38) 


“State two characteristics of weather 
that an mT air mass will bring to New 
York State." (June 1950, РагіП, 2 b) 


- “Distinguish between weather and cli- 


mate." (January 1950, Part II, 6 a) 


IV. Physics 
A. Electricity and Magnetism 


1, 


«Тһе filament now used in most elec- 
tric lamps is made of ain 
(January 1950, Part I, 24) 


- “When the south pole of a magnet 15 


brought near the head of an iron nail, 
the head of the nail becomes a south 
pole.” (June 1950, Part I, 33) 


- *'A step-up transformer used to орег- 


ate a neon sign has a turn ratio of 1:100. 
The primary voltage is 110 volts. The 
primary current is 10 amperes. The 
secondary current is .09 ampere. Find 
the (b) wattage of the primary; (c) wat- 
tage of the secondary." (January 1949, 
Part II, 6 b, c) 


B. Mechanics 


1, 


“Тһе moment of a 20-pound force push- 
ing perpendicularly on a lever five feet 

from the fulcrum is pound- 

feet.” (June 1949, Part I, 20) 


- “Тһе theoretical mechanical advantage 


ofa wheelandaxle is 6. The wheel 
diameter is 12 inches. The axle diam- 
eter is inches." (January 1950, 
Part I, 20) 


- “А 500 pound weight is drawn up an іп” 


clined plane 15 feet long and 3 feet high: 
The effort required is 125 pounds. Foni 
the actual mechanical advantage. 
(June 1950, Part II, 3 a) 


C. Density 


1. 


“Аз a liquid contracts, its density 
." (June 1949, Part I, 37) 


2. “Two solids show equal apparent losses 


of weight when submerged in water. 
Their densities must be equal." (June 
1950, Part I, 38) 


“Тһе apparent weight of 3 cubic feet “ἡ 
metal submerged in water is 375 ρουπώ5' 
(Density of water is 62.5 pounds per cÚ 

bic foot). Find the (a) volume of Water. 
displaced; (b) weight of water displace?: 


September, 1955) 


(c) weight of metal in air.’ (January 
1949, Part II, 1a, b, c) 


о А қ 
іп р in Physics included many 

ifo of light, sound j= 
entific Series: Сы nd, and the use of sci 


F š 
E the analysis of the difficulty of the 
Sis was bep КЕШЕ of subject-matter, an analy- 

een the toes to determine the relationship be- 
еп and thei Ж in which the items were writ- 

Parts I d egrees of difficulty. 
of Short-aps all the examinations are composed 
Choice, m inl type items such as the multiple 

es. Paz ified true-false, and completion 
Subjestive rts II of the examinations are more 
items that іп nature, and include essay-type 

intonsa. gee more explanations and de- 
Srams; dro drawing or interpretation of dia- 
of equations, na problems; and the writing 
n : 
that the oe Biology Examinations, it was found 
оп Part 228651 percentage of the difficult items 
Sasiest ШЕ н of the completion type, while the 
ПШ Меша. ы the multiple-choice. On the Part 
ter or EA particular type of item seemed eas- 
1 е ка difficult than the others. 
Оша on the type of short-answer type of item 
multiple- е Chemistry Examinations was the 
Coulg E Choice type. Hence, no со mparison 
ut Ракоса On Part II, however, the high- 
requir cà the difficult items included those 
Mandeq th е explanations, and those that de - 
ty n Bur τι riting of equations. 
er Modified of the Earth Science Examinations, 
Чез Seemed true-false and the completion it~ 
t ats; whi to be about the easiest for the stu- 
he Пе the multiple-choi ed to be 
digg; Ost difficult ple-choice seem 
бар Cult essay ite, On Part II the percentage of 
or p tems ay items was high, but among the 
са interpre: € those that required drawings 
three arts lem of diagrams. | 
Prox, pes NS the Physics Examinations, the 
Сус, Mately B Short-answer items were of ap- 
Was ? Ше dud difficulty. On Parts П, how- 
See high, дос of easy mathematical items 
Med fo pa umber of mathematical items 
Tang Overall difficult, also. 
comparison of the items on Parts 
Цеци 18 a pedes examinations indicates that 
lta. t items stantially higher percentage of dif- 
typ Pears On Part П than on Part I. Hence, 
tha, tems а t in general, Ше short-answe? 
ane less difficult for most inae 
e "essay-type." However, the 
the ds c marized fail to reveal any consistent 
riou SUE the degrees of difficulty of 
i Pes of items. Hence, the specil~ 
degr al ота the item is written does not 
Рреаг to be a significant factor inits 
ifficulty, 


MALLINSON - BUCK 


81 


Summary 


From an analysis of the degrees of difficulty 
of the various examination items, a few generai- 
izations may be made: 


1. It appears that items involving current 
science information (such as antibiotics, the hy- 
drogen bomb, etc.,) seem to be more difficult 
than other types. A possible explanation ma y 
be the fact that many textbooks are not up-to- 
date. 

2. Many of the difficult items are also ambig- 
uous, and hence present difficulty for the stu- 
dent in answering, as well as problems of scor- 
ing for the teacher. 

3. Items involving the use of scientific atti- 
tudes, applications of knowledge, and the use of 
elements of scientific method are in general 
more difficult than the factual type. 

4. Essay-type items are generally more dif- 
ficult than the short-answer items. 


Discriminating Power of the Examination Items 
In general, there are two different views with 
respect to the concept of discriminating power 


of an examination item: 


is designed to measure one pre- 

it will ideally cut the examination 
namely, those students 
the item 


1. If an item 
cise objective, 
group into two sections, 
above a given 
correctly, and 


who answer the item in 
scores for such an ideal item were plotted, the 


resulting histogram would have the follow ing 


configuration: 
е too 
Ë 
p 
T" ο 
“ 
о 
e 60 
9 
8 
2 
а * 
ыс 
d 
н 
Ë 10 
d 

o 
o 10 40 60 8ο лоо 


total test score 


2. If an item is designed to measure а gen- 
eralized objective, or a multiple set of objec- 
tives; or if the item is used to test a group of 
individuals whose quality and quantity of train- 
ing with respect to the objectives differ, then 


82 JOURNAL OF EXPERIMENTAL EDUCATION 


an analysis of the average scores of the item 
would ideally rise gradually as the total scores 
of the group increase. A graph similar to the 
following would result: 


Loo 
Б 
я = 
Б 
о 
о ео 
t. 
° 
8 
Ж 49 
© 
ερ 
Е 
5 ло 
с 
o 10 το bo Bo loo 


total test score 


The New York Regents Examinations in Sci- 
ence are designed to measure a multiple set of 
objectives. In addition, although guided by a 
State Syllabus, no teacher is required to pre- 
sent a given course of study. Hence, no two 
teachers are likely to teach the same kind or 
amount of science material. Therefore no two 
groups of students who take the examinations 
are likely to have the same training. For this 
reason, the discriminating power of the exam- 
ination items needs to be evaluated in terms of 
the second viewpoint described above. 

Thus, the average scores of the items on 
Parts I of the examinations and the percentage 
Scores of the items on Parts II were analyzed 
to determine whether they increased as the to- 
fal scores increased from to 100. The fol- 
lowing criteria were established for use in this 
Study, as a measure of discriminating power: 


1. An item was considered to have excellent 
discriminating power, if (after the initial in- 
crease) its average score, in Seventy-five per- 
cent of the cases, increased consistently (a one 
to ten percent increase) with each one point in- 
crease in the total examination Score. 

2. An item was considered to have moderate 
or ‘‘average’’ discriminating power if the in- 
crease in twenty-five percent or more of its 
average or percentage Scores, fluctuated be- 
tween ten and twenty-five percent. 

3. An item was considered to be a poor dis- 
criminator if the increase in twenty-five per- 
cent or more of its average or percentage 
Scores fluctuated by more than twenty-five per- 
cent. ' 

4. АП items that had consistently low or con- 
sistently high average or percentage Scores 

were also considered to have poor discriminat- 
ing power. 


(Vol. 24 


Results 


Table XIV summarizes the findings of the ang 
alysis for the discriminating power of the exam 
ination items. ха 

Table XIV indicates that on the Biology Ех- 
aminations, the greatest number (approximate 
ly sixty percent) of the items was found to m 
average discriminating power, while only abo e. 
Íour percent were found to be excellent discrim 
inators. Of the poor discriminators, four per 
cent were so classified because of the gr eat $ 
fluctuation of their average scores. Approxi Й 
mately thirty percent were considered poor be 
cause they were easy, and two percent because. 
they were difficult. It is interesting to note 
of the easy items the vast majority were foun 
on Part I. 

There were only three (about one pe беп), 
of the Chemistry items that could be classifie 
as excellent discriminators on the basis of ше 
criteria outlined above. Again, the majority nt) 
the Chemistry items (about forty-three радост 
were found to be of average discriminating ро 
er. Of the poor discriminators, twenty-four 
percent were so classified because their ауел 
age scores fluctuated greatly; about thirty рот 
cent were easy, and about two percent аш 
Again, the majority of the easy items appear 
on Part I of the examinations. t 

On the Earth Science Examinations, аро ed 
two percent of the items were considered до te- 
ing excellent discriminators while approxima 
ly fifty-two percent, average, Twelve perce à 
Were considered poor because their аже а 
Scores fluctuated greatly, about thirty-one р Sra 
cent because they were easy, and about one P 
cent because they were difficult. 3 

Of the items on the Physics Examination 
three percent were found to have excellent Ба 
criminating power, while fifty-three μον the 
were found to be average discriminators. eut 
poor discriminators, nineteen percent ЖЕН ее 
categorized because their average scores se 
tuated greatly, twenty-three percent bec aher 
they were easy, and one percent because 
were difficult. ней 

Items in the following areas were classifie 
as having poor discriminating power: - 


І. Biolo 
rr plant embryo with a food supply rag 
protective coat is called (1) a fruit, ү.” 
Seed, (3) an embryo sac, (4) an ovule- 
(January 1950, PartI, 4) 


2. “Tell whether each of the ecce Де 
true ог false and give your reasons. , 
“Poison ivy can be destroyed by pourinÉ 


Sept 
ptember, 1955) MALLINSON - BUCK 


TABLE XIV 


PERCENTAGES (APPROXIMATE) OF ITEMS OF EXCELLENT, 
AVERAGE AND POOR DISCRIMINATING POWER 


Excellent Average Poor 
Type of Test Discriminating Discriminating Discriminating 
Power Power 
з 
Biology 7 : 
Chemistry 7 7 
Earth Science 7 у 7 
Physics ЎА 24 
ο 
TABLE XV 

THE POPULARITY OF THE ITEMS ON PARTS П OF THE SIXTEEN 

R 


EGENTS EXAMINATIONS 


ч Exemination 
B А 9 
iology, 3,9 
June 5,4 
den. 3,4 2 
une 
Chemi 1,2,3,5 
emistry, Jan, 1949 B 1,2,4,5 
June 1949 6 5 1,2,5,4 
Jan. 1950 4 3,5 1,2 
June 1950 
E Р 4:6 5,7 1,5,8 
arth Science, Jan. 1949 é aie 6 1,3,8 
June 1949 4, 5,6,7,8 1,2,5 
Jan. 1950 1,3,7 4,6 2,5 
June 1950 
à 1 2,5,6 
| Physics, 1; 2 
! 1,2,3,4,5 
2,8,5 


83 


84 JOURNAL OF EXPERIMENTAL EDUCATION 


salt water on its roots." (June 1949, 
Part W, 2 bj) (See graph I) 


istr 
E Cae лса of the proposed hydrogen 
bomb involves a change of hydrogen to 
(1) argon, (2) radium, (3) helium, (4) ur- 
anium," (June 1950, Part I, 50) 


2. “Describe how to make acetylene,” 
(June 1950, Part II, 7 a) 


Ш. Earth Science 
1. **Feldspar may change to when 


acted on by moist air. ” (January 195 0, 
Part I, 9) 


2. ‘Explain why relative humidity decreases 
as temperature increases,” (January 
1949, Part II, 2 d) 


IV. Physics 
1. “A balloon will rise 
own weight of air, ' 
1950, Part I, 31) 


until it displaces its 
(True-false) (Tune 


2. “Тһе diagrams (of saxaphone and violin 
Sounds) represent the wave patterns of 
the same note sounded on two different in- 
struments. State One respect inwhich the 
Sounds are similar, tate one respectin 


different.’ (Jan- 


The following are examples of items show- 
ing excellent discriminating power: 


I. Biology 


IL Chemistry 
1. “Isotopes of uranium have q: 
atomic numbers, (2) atomic 


numbers of planetary electro 
bers of protons, » (June 195 
48) 


ifferent (1) 


ІП. Earth Science 
1. “Тһе material deposited by a Stream at 


the base of a mountainformsa (an) 
(January 1950, Part I, 17) 


” 


IV. Physics 


1. “А bottle can hold 120 grams of water, 


The same bottle can hold 96 grams of 
alcohol. The volume of the bottle is 


cu. cm. The Specific gravity 


of the alcohol is 


." (Tune 1949, 
Part I, 18, 19) 


Summary 


The following generalizations may be ma = à 
relative to the discriminating power of the items: 


1. Few of the items on any of the examina - 
tions could be considered as having excellent 
discriminati power. 

2. The Reese percentage of the items were 
classified as average or poor discriminators. 

3. There was an extremely small percentage 
of items Showing consistently low ο σσ 
while a large number Οἱ items had consistently 
high average or percentage scores. Of these 
latter, the majority were items on Part I. 


Popularity of Items 


Since a student has an Opportunity to choose 
five out of eight or nine items on Part Π of the 
examinations, it was decided to analyze the items 
with respect to their “popularity”? with the stu- 
dents. To do this, the percentages of persons 
electing the various items were determined by 
dividing the number of students choosing an it 
em by the total number of students obtaining а 
particular total score, This was done for each 
of the total score groups from (65) to 100. The 


Percentages were then categorizéd on the basis 
of the following criteria:10 


1. If the Percentage was thirty or below, the 
item was considered to be “unpopular”? with а 
Single score group. 


2. If the Percentage was seventy or above, 


the item was considered to be “popular” with a 
Single score group. 


3. Items whose Percentages ranged between 
thirty and seve 


Піу were considered to be of aver- 
аве popularity with а Single score group. 


Based on these criteria, the popularity of ë 
each item was tabulated Íor each total scor 


from (65) to 100. These tabulations were then 
&rouped as follows: 


1. If the item was popular with seventy-five t- 
Percent or more ot the Score groups, it was lis 
€d as popular, 


- If the item was 
five per 
listed a, 
3. If the item Was of average popularity with 
Seventy-five percent or More of the scores 
ioe it was listed as being of average рори" 
1 


4. Ἡ there were a, i l num” 
bers of Pproximately equa 


items in both the unpopular and average 


(Vol. 24 


85 


9 


MALLINSON - BUCK 


September, 1955) 


61008 4604 16404 


оо 66 86 16 96 56 46 το 76 16 ος 68 ВВ 18 98 58 +8 $8 18 18 08 6. 9L LL % εἰ фа съ ZL VL о 69 ө % 5 (6 


(нямоа NOLLVNIWIHOSId HOOd) 
6761 ANAC 'NOLLVNIWVX3 ADO'IOIH СП 1нуа “14 г WALI ЯО SHHOOS HDVHHAY 


IHdVH5 


шөді ЧО 61008 оЗвледз 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


86 


©1098 4804 [u303 


9966/9615 36.19 36 Ce. 26; i6 ов 68/90 атлета AQ Cueta. (3,08 61192312 ра әс за sU τὸ іш oD 09 13753 569 


(чямоа NOLLVNIWINOSIG дияттяохя) 


бубт AHVÜNVI “мопумплухя хоототя 'I.LNVd ‘FT WALI NO SH3HOOS Ядутмяоняа 


п науно 


За цеодва 


Шезт uo өлоов og: 


September, 1955) 


Це Ботев, it was called an item of ‘‘medium- 

popularity. '* 

сад m there were approximately equal num- 

559 items in both the popular and average 

hi DS, it was considered to be of “m e dium- 
igh popularity. ” 


eee ne of the items together with their re- 
An classifications is found in Table XV. 
jority ο. of Table XV reveals that the ma- 
Heme, i out of thirty-six) of the Biology 
ity, hie | considered to be of average popular 
On "d four were unpopular, and six popular. 
number Ὃ ο αμ Examinations, the largest 
аа fourteen of thirty-two, were 
Seven of to be popular, four unpopular, and 
Tein Average popularity. 
unpopula a eSting to note that there were no 
ions. = items on the Earth Science Examina- 
erage po ifteen of thirty-two items were of av- 
Of {р шат, and eleven were popular. 
еге ро Physics items, thirteen of thirty-two 
a pular, ten were of average popularity, 
The еге unpopular. 
numbered le reveals also that more of the low- 
high cat items are in the popular or medium- 
(those ας while the high-numbered items 
more caring at the end of the examinations) 
Partly е onen unpopular. This is, of course, 
articule, ined by the fact that many students 
Scores) om. those obtaining the higher total 
~thus — the items in order—1,2,3, 4, 5 
faris ting those with higher numbers. 
lara d s Survey of the content of the popu- 
ега], the Popular items indicates that, ingen- 
With the POPular items are those concer ned 
followin knowledge of factual information. The 
Е are examples: 


L Bi ology 


“с 
Ind Я 
hair ogs, wire hair is dominant over smooth 


Eoo Wire-haired dog is crossed with a 
the ος haired dog. Show by keyed diagrams 
in ы 055 which would result in: (1) а litter 
Dear, Ch no smooth-haired pups could ар- 
Pup со (2) A litter in which a smooth-haired 
a) Could be found.” (June 1950, Part IL 2 


Chemistry 
Weights two of the following: (The atomic 
used to from the reference tables may be 
> 35, ae nearest whole numbers, e. g., СІ 
94 soqi becomes 35.) (a) How many grams 
tralize 1 hydroxide will be needed to neu- 
any 189 grams of nitric acid? (b) How 
for the ubic feet of oxygen will be red uired 
9f carb Complete combustion of 17 cubicfeet 
hydrop > monoxide? (с) How many liters of 
Tams €n sulfide gas will react with 99.3 
9f Pb(NO,), 23 (June 1950, PartI, 2) 


MALLINSON - BUCK 87 


IH. Earth Science 


«Тһе following questions refer to Ше accom- 
panying map: (a) Distinguish between contour 
line and contour interval. (b) State the con- 
tour interval of this map. (с) Whatis the high- 
est possible elevation of hill A? How much 
higher or lower is hill B than hill A?..... (g) 


(June 1949, Part II, 8) 


IV. Physics 
“A pulley system is used by a workman to 


raise a weight of 240 Ib. a vertical distance 
of 24 feet. The workman’s effort of 120 10. 
moves through a distance of 72 feet. Find (1) 
the ideal mechanical advantage, (2) theactual 
mechanical advantage, (3) the efficiency of 
the pulley system. " (January 1950, Part II, 


1a) 


The unpopular items were found to be those 
involving the application of information, the use 
of elements of scientific method, the use of sci- 
entific attitudes, and the use of sciene in indus- 
try. The following are examples: 


I. Biology 
**A boy without a microscope wants to findout 


if there are bacteria on his fingers. (1) What 
is a culture medium and how is it sterilized? 
(2) List two important steps in his experiment 
following this sterilization. (3) What would 
indicate the probable presence ofbacteria? 
(4) What evidence would he require to justify 
a conclusion that the bacteria had come only 
from his fingers?” (January 1950, Part II, 


8 a) 


п. Chemistry 
“(а) Describe a process for making ethyl al- 


cohol from molasses. (b) Name a by-product 
of this reaction. Give a use for the by-pro- 
duct. (c) Describe the manufacture of soap, 
mentioning the raw materials, the use of salt, 
and the by-product. ” (January 1949, Part II, 


πα, Ъ, с) 


IH. Physics 
*t(a) An electric motor drives a d-c generator 


which is used to charge a lead storage battery. 
(1) State step by step three useful energy 
changes that occur, beginning with the input 
to the motor and ending with the energy inthe 
battery. (b) Describe a laboratory experi- 
ment that may be used to illustrate two fac- 
tors that affect the magnitude of an induced 
emf.” (June 1950, Part I, 7 81, b) 


In addition to the above analysis, still an- 
other survey Was made regarding the relation- 
ship of popular and unpopular items with their 


88 JOURNAL OF EXPERIMENTAL EDUCATION 


degrees of difficulty. It is interesting to note 
that all or part of sixty-seven percent of the 
unpopular or ‘‘medium-low”’ items were also 

considered as being difficult. Hence, as one 

would expect, it appears that students tend to 

avoid the more difficult items. Of the popular 
or ‘‘medium-high”’ items, sixty-four percent 
appeared among the easy. 


Summary 


A general review of the data concerning the 
popularity of items indicates the following: 


1. The largest number of items were class- 
ified as being of average popularity, the second 
highest number, popular, 

2. In general, the popular items appeared 


early in the examinations, while the unpopular 


tific attitudes, 


The popular items were 
often of the me po 


mory or factual type, 


SECTION УШ 
SUMMARY AND CONCLUSIONS 


apply to all educati Ξ 
grams and at all levels, ως... 


2. The phases of the inve. 
with the objective characte: 
inations indicate that th 
and valid than teacher- 


Stigation that dealt 
ristics of the ехат- 
еу аге far more reliable 
made tests and com- 


(Vol. 24 


pare favorably with the commonly used standard- 
ized examinations inscience. While the discrim- 
inatory power of the items on the various exam- 
inations did not prove to be high, those onstand- 
ardized examinations fail to be much better. 

3. In general, the examinations are not prej- 
udicial to the interests of any particular group 
within New York State. While boys from the 
large high schools seemed to have the greatest 
achievement, and girls from small high schools, 
the least, the Superiority and inferiority were 
neither consistent nor especially marked. Ap- 
parently the examinations appear to be as good 
(or bad) for one group as for another, 

4. It does seem that a better system for scor- 
ing the examinations is indicated. Apparently 
teachers have been “on their own” more than 
may be considered desirable, and as a resulta 
number of irregular Scoring practices have oc- 
curred. Yet, none of these practices have been 
Sufficiently widespread to cast doubton the integ- 
rity of the science teachers of New York Stateas 
a whole, 

It would seem, as a final statement, thatthis 
Study failed to elicit the Slightest bit of evidence 

t the examination System in New York State 
Should be abolished, While it has revealed that 
the system of Regents Examinations in Science 

S weaknesses, the weaknesses are relatively 


the same as those that could be found with any 
mode of evaluation, ” 


FOOTNOTES 


1. МШег, Пау 


id John and Mallinson, George 
Greisen, « 


An Investigation of the Attitudes of 
Toward the New YorkState Reg- 
,""aminations in Science, ” Science Ed- 

лсайоп, XXXVI (October 1952), 203-215. 


РА Guilford, 7. р. Fundamental Statistics in 
Ps chology and Education (New York: Μο” 
Graw-Hill Book Co., 1950), 161-164. 


3. Рев, С. С. ana VanVoorhis, W. R. Sta- 


istical Procedures and Their Mathematical 
Bases (New Үог меге Mathematical 


w York: -Hi ok Со., 
1940), 399. McGraw-Hill Bo 


4. Credit is due Professor. Conway Sams, 8550" 
ciate professor of mathematics at Western 
Michigan College of Education for helping to 
develop the statistical design, and for com- 
Puting the corrections for the analysis of 
ie With unequal numbers of replica~ 


° Lindquist, Е. p, D 


А s, ЕХ” 
с Design and Analysis of EX” 
periments in Psychology and Education (Bos 
On: Houghton-Mifflin Co., 


1953), 108-120. 


М 
9] 
September, 1955) MALLINSON - BUCK 89 
6, 
ын, George W. Statistical Methods 1938), iii + 185. 
mes, Iowa: Collegiate Press, 1946), 289- 


10. Examinations. University of the State of 
New York, Division of Examimtions and 
Testing, Albany, New York, Decem ber 
1951. 


293 


Л, А 
ο. directors wish to express their grati- 
We cie the many science teachers of New 
rk State who contributed their time and 


efforts to evaluating the word lists. 11. It should be noted that since each student 


must select five of the eight or nine items, 
the chance factor would result.in the selec- 
tion of any item by approximately fifty-five 
percent. However, the cutting scores for 
this analysis of popularity have been setar- 
bitrarily at thirty and seventy percent and 
hence make allowance for the chance factor. 


8. 
Реве, Rudolph. The Art of Plain Talk 
ei W York: Harper and Brothers, 1946), 
| li + 210, 


9, А 
cackingham, B. R. and Dolch, E. W. A 
bined Word List (Boston: Ginn and Co., 


ο 


A COMPARISON OF WECHSLER CHILDREN’ 
SCALE AND STANFORD-BINET SCORES i 
FOR EIGHT- AND NINE-YEAR OLDS 


FRANK C. ARNOLD 
Bowling Green State University 
WINIFRED K. WAGNER 
Fremont, Ohio 


ied ле HLY verbal nature of the Stan- 
Sidered е шіге genge Scale has long b-en con- 
of certai rawback in the psychological testing 
Schools να The examiner in public 
in casés τ frequently use performance tests 
quate bec ere the Stanford-Binet seems inade- 
acter, e ause of its predominantly verbal char- 
der alan g., the testing of children who work un 
Speech Pe gc handicap, those handicapped by 
the ale hearing defects, or those in whom 
ties ha, opment of verbal and non-verbal abili- 
The рееп unequal (1). 
(7) or > Wechsler Intelligence Scale for Children 
and perfo C, on the other hand, gives a verbal 
ha сос ішсе всотегав wall ΒΒ ἃ total score. 
9f relati, ion arises, however, as tothe degree 
this ре between scores derived from 
Binet we and those obtained with the Stanf or d- 
Purpose po already widely accepted. Itis the 
Ship bety, this study to ascertain the rela tion- 
eight- ееп these two scales for a sample of 
Sevens nine-year olds. 
Show а p 1 studies reported іп the literature 
elatively high relationship between the 


ο вс 
md ο Cohen and Collier (2), using first 
tion τς ΟΠ] graders, report Pearson corr ela- 


an between IQ's on the Stanford 
for the ` WISC of . 82 for the verbal scale, . 8 0 
Scale, Wrformance scale, and . 85 for the full 
latio Br зеп and Higginson (3) found corre" 
71, . 63, and . 80 respectively be- 
a per performance, and full-sca le 
δὰ fou, qa and Stanford-Binet IQ with unse lect- 
58, τον τα, Ina summary of four stud- 
we. Tom Ovic and Guthrie (5) report r’s rang- 
Bc y, 63 to .83 between the Binet and the 
Berto, er bal Scale, from .57 to . 75 for the 
idle, nce scale, and . 71 to . 88 for the full 
sa тап (лен, Justman, Wrightstone, а nd 
meles for ) obtained correlations of the two 
ον πίῃς Исав age levels from 5 years, 6 
. o апат 5 years, 6 months. They report 
0 to буп from . 65 to . 90 for the verbalscale, 
for the performance scale, апа.75 


to .90 for the full scale. Findings similar to 
these are reported by Weider, Moller 
and Schramm (8). 


Procedure 


In the present study, à random sample of fifty 
children was selected from the eight- and nine- 
year olds in the third and fourth grades of the 
Bowling Green, Ohio, elementary school system. 
The Stanford Binet Intelligence Scale (Form L) 
and the Wechsler Intelligence Scalefor chil- 
dren were then administered to each of the fifty 
subjects. All tests were administered by one 
person and were scheduled so that not more than 
one week elapsed between administration of the 
two scales. The Stanford-Binet administration 
preceded the Wechsler for one-half the subjects 
while the Wechsler administration preceded the 
Stanford-Binet for the remaining subjects. The 
order of administration was set on the basis of 
odd-even number of the subject. 


Results 


Means and standard deviations of the IQ's ob- 
tained by children in this sample on the two 
scales are presented in Table I. Comparison of 
these data with those reported by Terman and 
Merrill (6) and Wechsler (7) indicates that ob- 
tained results are quite similar. 

In Table II* are presented the correlations be- 
tween the Stanford-Binet and WISC scores with 
which we are concerned here. For purposes of 
comparison with reliability data of the Stanford- 
Binet, correlations between IQ's have been used 
as well as between mentalages and scaled scores. 
Correction of these r’s has been made to take in- 
to account the differences between standard devi- 
ations obtained with this group and those report- 
ed by Wechsler (7). 

In assessing the interchangeability of 
two scales for use in working with children, 

a logical approach would seem to be that of de- 


92 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


COMPARISON OF 105 OBTAINED BY FIFTY CHILDREN ON STANFORD-BINET AND 
WECHSLER INTELLIGENCE SCALE FOR CHILDREN 


Stanford- 


WISC — 
Measure Binet Verbal Performance Full 
Mean 104.52 101.88 104, 70 103.34 
Standard Deviation 15.66 12.75 15. 40 


13.59 


EEN WECHSLER INTELLIGENCE SCALE 
STANFORD-BINET 


WISC 
Item Correlated Verbal Performance Full 
Mental Age with 


Scaled Score 


JT . 69 .81 
IQ .85 .15 .88 
IQ (Corrected) . 88 74 90 


(Vol. 24 


1 


September, 1955) 


termini 
eas whether the relationship between the 
ility E significantly from the relia- 
Cern here icient of one of the scales. Our con- 
tion coeffi would be a comparison of the correla- 
the Stentor oe obtained between the WISC апа 
ОЇ the Stang Binet and between Forms L and M 
between thi ord-Binet. If the relationship found 
different fes WISC and Binet is not significantly 
et, then den that between two forms of the Bin- 
€ Binet 1 use of the WISC as a substitute for 
» On the a ДЕ IQ would seem reasonable. 
Cantly, th er hand, results do differ signifi- 
; then other factors must certainly be con- 


Sidereg ; 
other. in the substitution of one scale for the 


T 
lations? o «98 given as the median value of re- 
r ages τη etween Forms L and M of the Binet 
his com ix to sixteen was used as a basis for 
емо 00 The corrected correlation с0- 
isher reported in Table II were transformed 
Ween the 5 Scores and differences computed be- 
ions and th Scores equivalent to these cor re la- 
Tom diese aie equivalent to a correlation 23. 93. 
Puted usin, ifferences, critical ratios were com" 
be ееп, a the standard error of the differ e nce 
the Binet S. The critical ratio found between 
о, pis recat г of .93 and the corrected 
2nd the Sta, tained between the WISC Verbal Scale 
at the 10% nford-Binet was 1.74, significant 
Г of ή ty level of confidence; for the corrected 
and the Б] беп the WISC Performance Scale 
b confide; was 4,37, significant at the. 1% lev- 
wetWeen the nce; and for the corrected r of .90 
fae L WISC Full Scale IQ arid the Binet 
idence,’ Significant below the 10% level of con- 


„Ма 
εἶσαι ni sented in Table I would seem to in- 
to dren enn UP obtained from this sample of 
(6 those re the Stantord-Binet are quite similar 
). Ported for the standardization gr o üp 


oe ned with relationship between the 

Similar to that found by other inves- 

Se hether mental ages and scaledscores 
used, correlation coefficients аге 


Муер 
beg in A squaring of the corr 
тетеп icis I Shows Ше common variance 
sel Scal УЛЕС and the Binet to be 77% for the 
latio, far ан and 81% for the full scale. 
nets ip p this sample is concerned, t 
Bin “Year etween IQ’s obtained for eight- 
аце is m with the WISC and the Form L 
M gp Ship Significantly different from the re- 
Ë the Bi etween IQ's obtained on Forms Land 
inet. бо far as totalscore 15 СОП” 


he re- 
and 


ARNOLD 93 


cerned then, the WISC might very well be sub- 
stituted for the Binet or the Binet for the WISC. 
From results of this study, the same wo uld 
seem to be true for the WISC VerbalScale. This 
would not seem to be true, however, for the WI- 
SC Performance Scale since the relationship 
found differs significantly from that between 
Forms Land M at the . 1% level of confidence. 

Clinically, it would seem that these findings 
have practical implications for the use of the 
various scales concerned. Total scores on the 
wISC or scores on the WISC Verbal Scale and 
the Binet would seem close enough to eachother 
to offer practical interchangeability of the two 
scales. At the same time, the WISC Perform- 
ance Scale would appear to be getting at a differ- 
ent facet of intelligence thanis the totalor verbal 
score of the WISC or the total score of the Bin- 
et, This study has not concerned itself with 
what these different scores may mean so faras 
prediction of various kinds of behavior is соп- 
cerned. However, if broadened prediction is 
possible with the performance scale of the WISC 
while at the same time the total score and verb- 
al score closely approximate that of a well-ac- 
cepted tool, the WISC may prove to bea quite 
useful clinical instrument. Further research 
is necessary, of course, both to check findings 
of the present study and to determine the mean- 
ing of sub-scale scores of the WISC. 


REFERENCES 


1. Arthur, Grace. A Point Scale of perform- 
ance Tests, Clinical Manual (New York: 


The Commonwealth Fund, 1943). 

2. Cohen, B. D. and Collier, Mary J. “А Note 
on the WISC and Other Tests of Children 
Six to Eight Years Old, " Journal of Con- 


sulting Psychology, XVI (1952), рр. 226- 


227. 
3. Frandsen, A. N. and Higginson, J. B. “The 
t and the Wechsler Intelligence 


Stanford-Bine 
Scale for Children, » Journal of Consulting 


Psychology, XV (1951), рр. 236-238. 
4. Krugman, Judith I. and others. **Pupil Func- 


tioning on the Stanford-Binet and the Wech- 
sler Intelligence Scale for Children, ’’ Jour- 
nal of Consulting PS cholo XV (1951), 


pp. 475-483. 
y. J. and Guthrie, G. M. “Some 


5. Pastovic, 
Evidence on the Validity of the WISC,’’ Jour- 
nal of Consulting Ps chology, XV (1951), 
pp. 385-386. 


6. Terman, L. M. and Merrill, Maude A. Meas~ 
uring Intelligence (New York: Houghton- 
Mifflin, 1937). 

7. Wechsler, D. Wechsler Intelligence Scale 
for Children, Manual (New York: The Psy- 
chological Corporation, 1949). 


94 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


8. Weider, A. andothers. ‘‘The Wechsler In- vised Stanford-Binet, " Journal of Con- 


telligence Scale for Children and the Re- Sulting Psychology, XV (1951), pp. 330-333. 


| 


ERRATA 


We 
regret that the following three tables were inadvertently left out of author Evan R. Keislar's 


artic š 
le “Peer Group Rating of High School Pupils with High and Low School Marks, ” published in 


the 
June 1955 Journal of Experimental Education. 


TABLE I 


LVE TRAIT RATINGS WITH OTIS 1.0. AND 


CORRELATIONS OF EACH OF TWE 
R 126 BOYS AND 128 GIRLS 


SCHOOL MARKS FO 


Otis I. Q. School Marks 


Girls Boys Girls 


MEE HEEL ТЕМЕН, 
-. 06 .10 -.22 


р ЫН = silent 10 17 21 06 
Ў acting - young actin . . . . 
$ Friendly - ти Е .03 214 209 „24% 
5. Likes schoolwork - dislikes schoolwork «44 38* "5% 2715ж 
6. Considerate - inconsiderate 17 . 09 .36* .26* 
T Popular - unpopular (with opposite sex) -.12 «19 -.07 -.10 
8 Persistent - not persistent .38* .23* .63* .49* 
9, Welcomed - ignored (by same sex) -.09 18 өз 17. 
10. uts studies first - puts studies last .30* .36 „70 „70 
11. Conceited - not conceited -.16 -.04 -.21% -.22 
12 Cheerful - sad -. 02 . 08 .05 09 
- Boys athletically competent - incompetent -.22 Ж -.07 E 


nfluential 


- Girls influential - not i 
t the . 01 level. 


*Si = 
ignificantly different from zero а 


TABLE II 


DIFFERENCES ON TRAIT RATINGS BETWEEN TWO GROUPS OF HIGH SCHOOL 
GIRLS MATCHED FOR OTIS L. Q. BUT DIFFERING IN SCHOOL MARKS 


Based on 27 girls in each group 
School Levelof 
Trait Marks Mean g D Sp t Signif. 
Talkative - Low 57.3 9.0 7.8 3.3 1 қ | 
2.9 


2.39 05 

Silent High 49.5 12. 
Old acting - Low 51.9 6.4 1,5 1.8 .87 z 

Young acting High 50.4 6.3 
Friendly - Low 53.0 7.0 

Unfriendly High 57.5 7.8 4.5 1.9 2.39 . 05 
Likes schoolwork - Low 39.9 5.6 

Dislikes schoolwork High 58.0 8.3 18.1 1.8 9.87 . 001 
Considerate - Low 49.4 6.7 

Inconsiderate High 53.0 5.2 3.6 1.6 2.24 . 05 
Popular - Low 55.0 9.9 6.2 2.2 2.84 .01 

Unpopular (with ор- High 48.8 9.9 

Posite sex) 

Persistent - Low 46.8 5.0 

Not persistent High 54.9 5.7 8.2 1.2 6. 66 . 001 
Welcomed - Low 52.3 4.1 

Ignored (by same High 54.0 6.2 1.7 1.4 1.23 ο. 

sex) 

Puts studies first - Low 44.4 4.4 

Puts studies last High 544 46 100 .94 10.62 001 
Conceited - Low 51.0. 6.3 зт 17 2.18 05 

Not conceited High 47.3 5.3 
Cheerful - Low 53.2 5.8 

Sad High 54.2 6.0 1.0 1..8 61 . 
Influential - Low 48.8 4.5 

Not influential High 53.4 5.9 


Е 4.6 1.2 3.79 .001 
Note: All figures reported have been rounded off to one decimal place except for 


the values of t. 


TABLE III 


DIFFERENCES ON TRAIT RATINGS BETWEEN TWO GROUPS OF HIGH SCHOOL 
BOYS MATCHED FOR OTIS I.Q. BUT DIFFERING IN SCHOOL MARKS 


ae 
Based on 35 boys in each group 
Level of 


School 
Trait Marks Mean σ D Sp t Signif. 
Talkative - Low 540 1L9 31 29 1.07 
Stlent High 50.8 11.6 
Old acting - Low 46.2 8.0 
Young acting High 508 81 46 21 2.35 05 
Friendly - 
Low 50.2 6.6 
Unfriendly на ма Ta aa іл іл 
Likes scho 
: olwork - Low 43.5 10.3 
Dislikes schoolwork High 57.3 9.2 13.8 2.2 6.31  .001* 
Considerate 
E Low 41.8 5.6 
Inconsiderate High 519 5.5 41 1 з 3.11 -0l 
Popular - Low 488 8.1 
npopular (with op- High 49.1 8. 1 .3 2.2 .13 
Posite sex) 5 
Persiste 
nt - Low 46.6 47 
Not persistent High 52.0 49 5 а @ 591 001 
Welcom 
ed - Low 494 67 
Ignored (by same High 52.8 6.8 3.4 1.7 1.98 
Sex) 
Puts studi 
udies first - Low 45.0 5.7 001 
uts studies last High 543 58 9? 14. 64 - 
Conceiteg - Low 51.5 5.4 3.0 13 2.36 «055 
Οἱ conceited High 48.5 5.1 
Cheert 
ul - 18 45 
Sad Sach ше pa е νυ 9 
Athlet 
ically ¢ 47.7 6.7 
A š ompetent Low 3.5 2.0 1.71 ... 
Note; decimal place except for 
te: АП figures reported have been rounded off to one 
the values of t. the hypothesis of normality could be rejected 


* 
F 
at the ^ distribution of trait eiue m 
+ 02 level but not at the - evel. —. А 
95 the distribution of scores the hypothesis of normali 


е . 05 level but not at the . 02 level. 


ty could be rejected at 


Journal of Experimental Education 


Volume XXIV 


December, 1955 


Number 2 


THE EFFECTS OF A “CAUS 
AL” TEACHER- 
RAINING PROGRAM AND CERTAIN CUR- 
RICULAR CHANGES ON GRADE 
SCHOOL CHILDREN’ 


RALPH H. OJEMANN, 
WILLIAM Н. LYLE, Jr., MAX 


EUGENE E. LEVITT 


INE F. WHITESIDE 


Child Welfare Research Station 
State University of Iowa 


THE р 
ü. Sults ΓΕ» = of this paper is to report the 
hie Child devel rning program designed to help 
ος Socia EUN а “causal” orientation toward 
Ne in this ee The learning program 
щъ hers a "T involved both the training of 
Г Content nd the use of certain special curric- 
€ mean; 

е 5 pue of the term “causal” as used 

. i etailed in an earlier publication 


is rief] М 

et У, it recognizes that human behavior 

ingia Sh be a шен factors and that one candis- 
n an approach to a given behavior 


thi ent Whi 
as асбу СА гесовпідез and takes into account 
Шар, pared ος that may have produced it 
lY the ον ith an approach that considers 
"ien ase of the form of the behavior. 
Content» term ‘causally oriented cur- 
arises from the discovery that 


Stug, 
ly aes геа 
et Е and texts is largely non-causal- 
Sram impor p ented (7, 8). 
а tie S іпуору се of specifying the learning pro- 
hat t follow; Ing the training of teachers rests 
Rally eher s Previous data have suggested 
te for p σας avior toward children is essen- 
Cachete most ШУ oriented, Since our culture 
Wa Нод (уе part likewise oriented and since 
iod FR is net ο up through that culture, this 
ορ, OF t| Mente Pected, But the tendency 10- 
er Siden teac al orientation in the daily behav- 
но г the р er becomes important when we 
lig, m i thy oblem of developing a causal ori- 
5 Child. This may be explained as 


ἃ git he 
Sig у 
Ch: ; "et 
ala gion Med arithmetic we can conceive of 
Curat Perfor hich the teacher would teach the 
ч Ууз ile the various number operations 
х Че at the same time he (the teach- 


» U, 8, Public Health Services 


tay Parati 
Healt Of this paper was supported by Research Grant МН-301 f 


er) would make a number of “mistakes” on his 
income tax report. The child need never see 
these “mistakes” and thus they would not direct- 
ly influence his learning. 

But in teaching an approach to human behav- 
ior the situation is different. The teacher must 
cf necessity interact with the pupil. Throughthe 
approaches he makes to the pupil he provides a 
demonstration from which the pupil learns. Ifhe 
approaches the pupil ina non-causal way, the pu- 
pil is experiencing а demonstration of a non- 
causal approach. 

Thus, in the area of human behavior the teach- 
er teaches in two Ways: He teaches through the 
content studied and through the daily demonstra- 
tions he provides. Ina previous study (10) evi- 
dence was obtained indicating that it is difficult 
to develop à causal orientation ifthe regular 
classroom teacher and content remain essential- 
ly non-causally oriented and causal content is in- 
troduced for, say, one period a day by a trained 
but «jmported”’ teacher. 

Testing the effects of a learning program us- 
ing trained teachers and causally oriented curric- 
ular content involved: (a) а training program for 
teachers, (b) а plan for changing curricular con- 


tent, (c) an appropriate experimental design, 
and (d) the gathering of data and analyses of re- 
sults. Each item will be briefly described іп 


ш { this study were four teachers 
and their pupils, each classroom matched with 
two control groups. One of the teachers was from 
the fourth grade, one from the fifth andtwofrom 

All were from the school 5у5- 


the sixth grade. с 
tem of a midwest industrial town of about 75,000 
population. Since this investigation is part of a 


long range program, it was desired to develop a 
group of trained teachers who gave promise of 


n. 
The subjects 0: 


rom the National Institute of 


96 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


ining i em several years. Ac- people ordinarily meet in their daily living; d 
ΑΕ bes on this asis by to acquaint the individual with general psyche 
ihe κ OY administration in consultation withthe logical principles which have maximum eo 
investigators. Data relative to the experimental vance to these problems both with n dea 

bjects will be presentedin a descriptionof the handling personal problems which exist cu 
кои tal design. rently and the off-setting of present behavior 
Ri d ad ἽΝ trends which could conceivably lead to €: 

I ining roblems; and to assist in the developmen 

ле қат: αμα oat techniques for meeting the frus aed 

Our plans involved providing teachers with tions which most of us normally ο. 

one month of intensive work during the summer The preparation of an extensive persona пай 
and following through with group conferences tobiography was required following dee ға N 
every three weeks during the school year. These ly the lines presented in Stogdill’s Menta ра 
conferences were intended to give the teachers giene workbook entitled Objective Per sonality, 
opportunity to discuss any questions or problems Study. Extensive comments were made a ie 
which might arise during the year as closely as the material contained in the units of the 7 

possible to the time they might arise. book which were intended to stimulate thin 


The month's program of intensive work was 
set up under circumstances similar to the usual 
academic situation. Limited credit ona m inor 
problems basis was allowed for those teachers 
who indicated that they wished to receive aca- 
demic credit. The program was organized in 
terms of six units, all but one to 5e completed 
during the four week period. The description of 
the units, the time devoted to each, andthe rea- 
son for their inclusion in the program are pre- 
sented below: 


ing about their personal experiences. Tune E 
teachers were encouraged to explore the ex 
tent to which their own personal biases and 
predilections might structure the classroom 
situation in the hope that this might minimize 
the extent of influence of that bias. No indi 
vidual sessions were held with members of the 
group except when they presented themselves 
to ask for individual discussion, at which time 
they were encouraged to raise their questions 
for discussion in the group. That is, an ex- 
plicit attempt was made to focus attention on 
the group situation and to bring discussio n. 
material to the group rather than to take ma 
terial away for inüividual sessions. Menibers 
attention to the fact that “having proble ms” of the group were assigned collateral пече 
does not necessarily make a child a problem from Philip Eisenberg, Why We Act see 
Child. Emphasis was upon the kinds of devel- and from Hugh Cabot and Joseph A. Kahl, Hu 
opmental tasks children face at various ages, man Relations, Volume I, ‘Concepts. RM 
the kinds of basic learnings which are neces- readings plus the autobiographical mate ria _ 
Sary for proper handling of these tasks, and provided the vehicle for group discussion. In 
the problems which are created when tasks ap- structor: W. H. Lyle. 
propriate toa particular age level are not 


learned before the following level. Selected Unit 3. Action Research in the Classroom —two 
portions were assigned of F. Redl and W. W. 


hours per week. An essential part of (ше 
Wattenberg, Mental Hygiene in Teaching; R. unit was an attempt to discourage the teache Ж 
J. Havighurst, Human Education and Develop- from having too much confidence in her obse 
ment, Association for Supervision and Curric- 


Unit i. Developmental Problems of the Normal 
Child—three hours per week. The primary 
purpose of this unit was to draw the teachers' 


vations and her ability to predict from the 5 
ulum Development Yearbook, 1950, Fosteri Data were presented on the problems involv x 
Mental Health in Our Schools; and Gladys jenk- in the determination of the reliability of obser 
ins et. al., These Are Your Children. It was vational techniques and the predictive егіс” 
our intention for teachers to understand from iency of these observations. Some methods 
these materials that children are continually for placing her own observations ina гезеагс 
facing problems and that problems are a nec- framework were presented and the teachers 
essary result of the child's expanding socia! Were encouraged to make their observations ë 
environment. Instructors: S. L. Zelen and in a somewhat more systematic manner. E 
Cs De Smock, lected papers were used as Боа пиа 

апа по outsi і as assigned. Ins 

Unit 2. Personal Problems of Everyday Lifé— tor: E та 2 х 
five hours рег week, This unit was set up, а š d 
but not labeled, as 20 one-hour sessions of Unit4, Th derstand“ 
Е е Causal oach to an Unde 

group psychotherapy. It was presented to the ing of Human у даден hours per week. 
teachers as an Opportunity to “extend the in- The prj τ it was to ασ” 
dividual’s understanding of the problems ee the fo purpose of this uni 


e 
quaint the teachers with the background of th 


December, 1955) 


Unit 6 


vica its origin, and its presentstatus. An 
Tm itional function of this unit was toacquaint 
mem with the special materials which had 
been developed by the project. Instructor: R. 
H. Ojemann. 


Uni š 
nit 5, Meeting Classroom Problems—three 


още рег week. This unit was under the di- 
who un of an experienced classroom teacher 
pac, had been working with the project for the 
in оја © years and had had direct experience 
Sante Ssroom situations. This was а technique- 
help e unit. That is, our attempt was to 
in the de ice to utilize known techniques 
develo ndling of classroom problems and to 
hor ios Special techniques which would allow 
Classr meet the daily problems arising in the 
Were oom, ““Туріса!”” classroom situations 
Some presented to the teachers to give them 
e su ee in understanding what would 
88 o rface ways of handling these situations 
probable а to possible causal methods. The 
ared € effects of the methods were сот - 
needs, p Our concern was with individual 
be ассо ut it was our feeling that most would 
Were n mplished if group and individual needs 
encourage jointly. Many previous attempts to 
of боша the teacher to take individual needs 
Consiq ren into consideration have failed to 
or at em that this can only be accomplished, 
Within ao. accomplished most effectively, 
Ee framework of guod group control. 
8roup a 23682 the constructive forces of the 
etin ге at the disposal of the teacher in 
eria]s g ual problems. All of the ma- 
vious} eveloped and used by the project pre- 
as “J Were discussed with the teachers. In 
Draeticu this unit might be considered as а 
Nee j m companion for Unit 1, since assis~ 
formed. the handling of developmental tasks 
tor: M an important part of this unit. Instruc- 
TS. Maxine Whiteside. 


age Cticum in the Preparation of Special 
Гергесе 9 WO hours per week. This unit 
Wiican Our attempt to insure two-way com” 
Reed of 101. The project personnel felt the 
lal ро ;28Sistance in the adaptation of mater” 
ілуде USed. It was our belief that those 
might “als closer to the teaching situation 
i ο Pt materials to the classroom situa- 
View ο ге effectively both from the point of 
Part; o terest and appropriateness. The 
Write Pating teachers were encouraged to 
peg, terials to replace those we had devel- 
: be extend such materials, and to develop 
Wn prop tals utilizing the strengths in their 
Yere до :8510па| background. АП materials 
МВ as “Scussed with the project member act- 
porated visor and the joint suggestions incor" 
* For the most part, this proved to 


OJEMANN ET AL. 


97 


be a continuing project on which the teachers 
worked during the entire year. Instructors: 
Staff. 
Conferences with Experimenial Teachers 
During School Year 


Twelve meetings were scheduled during the 
school year or approximately one meeting every 
three weeks. Thegeneralpurpose of these meet- 
ings was to provide an opportunity for the teach- 
ers to ask questions concerning the classroom. 
work they were doing. It was recognized that 
the actual practicing of the causal approach in 
the classroom would give rise to more specific 
questions which could not be fully anticipated 
during the summer training program. In addi- 
tion, the meetings furnished the staff with an op- 
portunity to discuss additional topics with the 
teachers. 

One or more members of the staff led discus- 
sions on various topics which can be grouped un- 
der seven general headings. 


Materials—At each meeting the teachers 
were given an opportunity to ask any questions 
about the materials they were using. At six of 
the meetings, questions were presented and dis- 
cussed. Other meetings were used for extend- 
ing teachers’ background in child behavior and 
discussions relative to practicing the causal ap- 
proach in the classroom. 

At one meeting toward the close of the pro- 
gram the teachers were asked to suggest, onthe 
basis of their experience, the teaching sequence 
for using the materials. A discussion of the 
merits of teaching one type of material before 
another and the like, resulted in agreement as 
to the most useful sequence according to their 
classroom organization. 

Pupils—One of the main topics of the first 
meeting was a discussion of specific classroom 
situations which the teachers had faced, com - 
ments on the surface and causal methods to 
handle such situations plus a description of the 
way the teachers had handled the situations. 
Part of nine other meetings was spent discuss- 
ing this topic. In this way, the teachers were 
given an opportunity to check their own behavior 
as surface or causal as wellas the behavior of 
the pupils. 

For example, one teacher had been obs erv- 
ing a girl who seemed to play with no one, who 
stayed by herself but had made it known that 
she wanted to associate with others. Meetings 
with the parent, conversations with the pupil 
were described after which teachers asked ques- 
tions to obtain additional information, such as 
the teacher's hypothesis concerning the causes 
of the described behavior. The group then made 
and evaluated recommendations for possible 
methods of dealing with this situation. 


98 JOURNAL OF EXPERIMENTAL EDUCATION 


R ecords —At the seventhand twelfth meet- 
ings time was devoted to discussing what infor- 
mation the teachers would like to have about 
their pupils in order to better practice the caus- 

ach in the classroom. 
5 ‘Additional background in child behavior— 
The teachers were given an opportunity to ques- 
tion members of the staff relative to the findings 
of investigations of a variety of behavior patterns. 
The teachers’ questions arose primarily from 
observations of pupils in their r ooms whi ch 
prompted them to inquire about studies which might 
further their understanding of the pupils. Though 
some background had been provided during the 
summer program, a re-presentation was adyan- 
tageous because of the teachers’ actual observa- 
tion of the behavior being discussed. 

For example, one question was “Wo uld 
you discuss the ‘shy child’ in general and then 
consider a specific case which I will describe?” 

Qutcomes—At the second meeting the teaci- 
ers were asked to assist in the preparation of a 
“Tentative List of Outcomes Which Might Be Ex- 
pected as a Result of Teaching the Causal Ap- 
proach."' After a tentative list had been pre- 
pared they were asked to refer to it often during 
the school year and then toward the end of the 
Second semester, select the outcomes which 
they felt might be a result of teaching the causal 
approach at their particular grade level. The 
purpose of this exercise was to utilize the teach- 
ers’ experiences in making a tentative estima- 
tion as to what aspects of the causal approach 
may be developed at the respective grade levels, 

Evaluation— Toward the middle of the year 
the teachers were asked to evaluate the training 
program of the previous summer by answering 
the following questions: 

1. What do you feel were the most valuable 


parts of the training program last summer? 
Please list at least two or 


Results— Тһе last meeting was primarily con- 
cerned with the presentation of the statistical an- 
alysis of the results of the program. 


Development of Curricular Content 


As indicated above, previous Studies had dem- 
onstrated that content dealing with human behay- 
ior as currently found in elementary readers, so- 
cial studies and health texts is essentially surface 
or non-causalin nature, It was, therefore, nec- 
essary to develop more Causally oriented Content, 
To accomplish this a variety of materials we re 
prepared. Some of these materials were avail- 
able from previous Studies, some were prepared 
during the course of this investigation. 

In describing the preparation of causal con- 


tent a statement of the concepts and appr ecia- 
tions whichconstitute the goals of the lear ning 

program may facilitate discussion. We wish to 
help the child to understand and appreciate more 
about how his social environment operates, He 
is taught that there are many ways inwhicha 

given behavior pattern may develop, that causes 

are complex, that people are faced with many 

different situations which they are trying to work 
out, that they use a variety of methods for this, 

that additional methods may be availableand that 
all the methods may be considered in terms of 

the effects they have. Е 

In contrast to such concepts as these, chil- 
dren under present usual conditions are ta ught 
essentially what people do and primarily a judg- 
mental approach to the behavior without first 
Seeking an understanding of how it came aboüt. 

The situation appears somewhat comparable 
to that which prevailed in man's reaction to his 
physical environment. At one time man took a 
more or less arbitrary approach to his physical 
environment. It is only relatively recently when 
we consider the span of human history that he 
learned a more dynamic approach. 

A list of some of the elementary concepts гер” 
resented in the causal orientation has also been 
reported elsewhere (4). 

The nature of the curricular content is further 
revealed by the specific.materials developed. It 
is possible here only to list the various types. 
Readers who are interested in examining the ma^ 
terials at first hand may obtain copies from the 
investigators. 

The types of materials are as follows: 


1. Introduction to the causal approach by ze. 
Story method—the «Teachers Manual for Beha 


ior Materials in the Primary Grades" is a col- 
lection of twenty-seven stories grouped in sec 
tions for use at different grade levels. Eac 8 
Story deals with а particular behavior pattern. 
Preceding each Story the manual supplies κών 
background for the teachers. These materia 
have been described in earlier publications m 
The story is introduced and read by or tot q 
Pupils and is followed by a discussion designe А 
to guide the pupils into thinking of the οσο 
for the behavior” which were described ds 
the story. The teacher keeps two general que 
tions in mind during discussions: ces 
1) Did the children understand the different Ë 
between thinking of causes and not thinkin 
of causes? eet 
2) Did the children gain ideas of ways to m z 
ordinary problems so as to help each par 
ticipant grow? 
Stories for use in the intermediate grade T 
Were also written with a broader scope than tho 
for use in the primary grades. 


2. Expository presentation of causalapproach 


(Vol. 24 


——svn  — [oI 
nv 


December, 1955) 


as it applies both to development of behavior and 


the consideration of the effects of behavior—two 
5 bearing the titles ‘‘Two Ways to Look 
e ow People Act” and “When We have to De- 
"d provide in expository form the differences 
etween the surface and causal approaches. 
ъс А Series of workbooks which served as in- 
Mroductory units to social studies and health: 
Book I: How considering causes affects 
our reaction to behavior 
Books Папа III: How people work out feelings 
of self-respect and ‘‘counting-for- 
something” 
Books IV and V: How physical differences, ex- 
periences and opportunities may 
B affect different people 
300k VI: How past experiences affect methods 
T people use 
tob he booklets provided a variety of exercises 
е written out, unfinished situations for which 


endings were t i 
the like, ο be written or role-played and 


sect; Вемзей units in history and geography— 
© inc ns of history and geography were revised 
man узгракакв the elementary principles of hu- 
Soo For example, the unit on «Тһе 
geogra, Was revised to include discussions of how 
ence са and cultural conditions тау influ- 
they е situations people face and the methods 
employ to work them out. 
mate outs ог on the use of the room council—the 
apply th prepared by Stiles (9) for helping pupils 
cussion causal approach in room council dis - 
Чопа 19 has been described in previous publica- 


fate preparation of these materials, consid- 
Periengs were given to pupil interest, pupil ex- 
сксев апа vocabulary burden. The Do Ich, 
“I wet “Combined Word List” and Green's 
а Spelling Scale" aided in checking vocabu- 
vocabu material to be read by pupils. Listening 
Ty was scaled higher than reading vocab- 


тү 
the th recognition of the differences between 


ee ting new situations or experiences to fam- 
is nes is a technique often used by teachers. 
the vjbractice was taken into consideration in 
is ex ШЕ of materials with one precaution. Аз 
аера ed in the teachers’ manual of primary 
Ing out We “Since every child is engaged in work- 
aterial q own problems, it was felt that if the 
Situati dealt only with school and community 
becon 205 of children like themselves, they may 
lems 16 80 engrossed in their immediate prob- 
Pr eciati t they miss the larger more objective ap- 
Chi ЕЕ Accordingly, situations involving 
Т hildren Older and younger than themselves, and 
Mel às; from quite different environments 28 
themse Some situations involving children like 
Ves are included. ”’ 


OJEMANN ET AL. 


99 


Since the incorporation of the causal approach 
in teaching materials is relatively new, readers 
who are interested in detail are encouraged to 
examine the original materials. Particular ques- 
tions vary with the background of the reader and 
it is not possible to anticipate all of them. Asa 
guiding principle it may be helpful to keep in 
mind that the purpose of the learning program is 
to help the child gain more appreciation how his 
social environment operates just as physical sci- 
ence attempts to build an appreciation of how the 
physical environment operates. 


Experimental Design and Analysis of Results 


The evaluation of the teacher training pro - 
gram was actually concerned with pupil devel- 
opment rather than with teacher development per 
se. There were two reasons for this approach: 
a) the primary motive for the training of the 
teachers was to affect the pupils in certain ways, 
b) the number of teachers was obvious ly too 
small to permit any reliable measurement of 
teacher characteristics directly. The evaluation 
procedure is described in detail in the following 
sections. 

Control teachers —Two controlteachers were 
selected for each of the four experimentalgroup 
teachers. The control teachers were matched 
with the respective experimental teacher, inso- 
far as it was feasible, on a number of dimensions 
which might affect experimental results. These 
variables were age, sex, number of years of 
teaching experience, and educationallevel. The 
data are shown in Table I. The control teachers 
were selected from the same school system. The 
twelve teachers represented ten different elem- 


entary schools. 
It would have been desirable to have beenable 


to control other potentially pertinent factors like 
teaching ability. However, an analysis of the 
available literature indicates that such ехргеѕ- 
sions as teaching ability are rather nebulous and 
not easily defined or measured. It seemed pref- 
erable to deal with concrete measures and to as- 
sume that meaningful uncontrolled variables 
were randomly distributed among thegroupsof 
teachers. 

In addition to their training, the experimental 
teachers had been provided with various mater- 
jals for use in teaching the “causal approach” in 
the classroom. The purpose of the double con- 
trol group was to attempt to determine the effec- 
tiveness of these materials alone. Toward this 
end, the teachers in Control, were invited to se- 
cure and make use of such of the materials as 
they wished. The purposes and modes of use of 
the materials were outlined briefly. Their use 
was not, however, required of tha, Control, 
teachers and no attempt was made to insure 
that they were used. A check on the kinds of 


190 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLEI 


COMPARATIVE BACKGROUND DATA OF EXPERIMENTAL AND 
CONTROL TEACHERS 


——————— J| É— ——s.  1v—x— s 
Educational 


Teacher Age 


Experimental 26 
Control, 27 
Control; 26 
Experimental 52 
Control, 50 
Control, 52 
Experimental 44 
Control, 40 
Control, 50 
Experimental 26 
Control, 28 
Control, 26 


Fourth Fifth 


Grade 
Experimental 110. 89 
Control, 109.76 
Control, 106.31 
Total 109.53 


Sex 


F 
F 
F 


Fifth Grade 


F 
F 
F 


Sixth Grade (I) 


E 
F 
Е 


Years 


Teaching 


Sixth Grade (I1) 


hj nx nj 


TABLE II 


MEAN IQ SCORES BY CLASS 


Sixth Sixth 
Grade Grade (I) Grade (II) 
108. 47 111.74 105.68 
110. 08 101.84 101.20 
109. 94 108.63 107.88 
109.53 106. 78 104. 40 


Total 
109. 20 
105. 72 
108.19 
107. 48 


w “κο шү 


December, 1955) 


de A actually used and the number of hours 
ES ed to them was made at the conclusion of 
Investigation (see page 110). 
e γερο of the control teachers expressed a 
ας B be in Control, i.e., were apparently 
signed, ed in the materials. Teachers were as- 
μάς. random to the two control groups. The 
iis το ed control group, i.e., the one in w hich 
Bine) chers had no contact whatsoever with the 
ое аран is designated as Control). 
noi, pes The matching of teachers need 
of РЧ Fs have any affect on the disposition 
Man τ S within the various groups ofclasses. It 
Pupils сеѕѕагу to be reasonably certain that the 
suenos one or another of the groups were not 
controlie P any relevant way. Age and sex were 
ment of €d automatically by the methods of assign- 
othe di pupils to classes in the school system. 
ut ea enm ана in the sex ratio might occur, 
reveale or work with the tests to be used have 
Perfor, no systematic sex differences in 
mance, 
as Intelligence is likely to be an important factor, 
on Ше ср is in studies of this type. IQ scores 
dor tis Self-Administering Test, Intermedi- 
all pu am Were secured from school records for 
The нды 5 who participated in the testing program. 
I remis IQ scores by class are shown in Table 
Sis of a results of a treatment-by-grades analy- 
Table nr variance of IQ scores are shown in 


ёш analysis of variancel* of IQ scores yields 
БДЫ ΕΟΟ results. There are hence no 

ong a differences in int elligence either 
Brades ο three treatment groups, or among 
hot, (gs, or among individual classes. We shall 
In реро efore, be able to attribute differences 

S Tmance to intelligence. 

tellige have been able to control age, sex, andin- 
It is e nce among the pupils inthe various classes. 
kn i гу possible that there are other, un^ 
ime, "Ariables which are pertinent to the exper- 

11 we esign, a not uncommon occurrence. Де 
ly qis Pall assume that the pupils are random- 
Such pa among the groups with respect to 

T es. 
eva lat tests — Two instruments were usedinthe 
lem оп Proper. The first of these, the Prob- 
рс пау ons Test (PST), has been the subject 
S desg, айе investigation. Its developm ent 
Multip ред elsewhere (5). The PSTisa22 item 
with an "Choice test in which the subjectis faced 
Stone of instances of misbehaviors OT 
tih the les of children and is required to deal 
богу " either from the point of view of an au- 
tio ο. or from his own point of view. 
Dan? lea s possible responses for each situa 

itive те Punitive and three non-punitive. The 
x ©Sponses prescribe verbal ог physical 


OJEMANN ET AL. 101 


punishment, deprivation, or coercion. The re- 
sponses were obtained from an open-end form of 
the test administered earlier to a group of fifth 
grade children. 

The PST is considered to bea measure of 
punitiveness in the child, that is, his willingness 
to be immediately punitive in a hypothetical situ- 
ation where no retaliation is anticipated. The 
score for punitiveness is the number of punitive 
responses to the 22 situations. The PST has 
been shown to be related to authoritarianism and 
parental disciplinary methods (6) and to extra- 
punitiveness and intrapunitiveness as measured 
by the Rosenzweig Picture-Frustration Study (3). 
The reliability of the PST has been estimated as 
177 using Ше Kuder-Richardson formula 20, 
based on data obtained from the earlier studies. 
In a sample of fifth grade pupils the correlation 
between the PST and IQ was found to be -.29 (6) 
which indicates that only a small fraction of the 
variance of the test is due to intelligence. 

The second instrument was the Causal Te st 
(CT). The CT has not yet been widely investigat- 
ed, though it appears to have considerable prom- 
ise. It is a 30-item true-false type, the individ- 
ual items being based on eight descriptions of 
behavior. The test attempts to tap the child's 
awareness of the dynamic, complex, variable na- 
ture of human motivation, though it does not re- 
quire that he have any specific knowledge of the 
causes of behavior themselves. Thisawareness 
and its hypothesized behavioral concomitants has 
been called ‘‘causality’’ (4). The test is scored 
inversely, i.e., for non-causality, so that the 
higher the score, the less causal the subject. 
This was done so that the CT would vary direct- 
ly with the PST. The CT has been found to cor- 
relate -. 36 with intelligence in fifth grade pupils 
and to have a Kuder-Richardson reliability of .63. 
The latter is rather low, but it should be borne 
in mind that attitude and personality tests with 
young children cannot be expected to have relia- 
bilities of such measures with adults. 

A more detailed description of the CT will be 
found in another forthcoming publication (2). 

Experimental rocedure— Тһе tests were ad- 
ministered to all twelve classes on September 29 
and 30, 1954, approximately three weeks after 
the beginning of the fall semester. (The formal 
body of the teacher training program had ended 
in July, 1954.) This administration will be re- 
ferred to as the pre-testing. The secondadmin- 
istration, or post-test, took place on April 12 
and 13, 1955, approximately six and one-half 
months later. The tests were administered by 
three regular project staff members, each of 
whom tested the same four classes inSeptember 
and April. No administrator tested more than 
two classes in any one treatment group. The 


fog 
tnotes will be found at the end of this articles 


102 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE III 


ANALYSIS OF VARIANCE OF IQ SCORES 


———— IO 


Source d.f. ss MS E Р 
Treatments 2 565. 95 282.975 1. 762 7,10 
Grades 3 1028. 78 342. 927 2.135 «10 
τχα 6 1273.21 212.202 1.321 > .20 
Within Cells 228 36621. 96 160035 а. . 
Total 239 39489. 90 


TABLE IV 
* 
LOSS OF SUBJECTS DUE TO EQUATING CLASS N's OVER 
TREATMENT GROUPS 


Grade Experimental Control, Control; 


Problem Situations Test 


Fourth 19 (-0) 23 (-1) 17 (-1) 
Fifth 21 (-2) 22 (-0) 16 (-0) 
Sixth (I) 20 (-1) 31 (-9) 24 (-8) 
Sixth (II) 24 (-5) 28 (-6) 24 (-8) 
Total 
Eliminated 8 16 17 41 
N Remaining 
Per Class 19 22 16 
Causal Test 
Fourth 19 (-0) 25 (-0) 16 (-0 
Fifth 26 (-7) 26 (-1) 17 By 
Sixth (I) 20 (-1) 31 (-6) 23 (-7) 
Sixth (II) 23 (-4) 28 (-3) 25 (-9) 
Total 
Eliminated 12 10 17 39 
N Remaining 
Per Class 19 25 16 


—— AL CR 


(Vol. 24 


December, 1955) 


an were administered in the same order and 
ee limits were set. Despite this leniency, 
Tin Were a number of incomplete protocols in 
Th y class, especially for the pre-testing. 
ese were invariably discarded. 
ρω of subjects who successfully com- 
EROT oth pre- and post-tests varied from 
HEP t5 0 class for both of the measures. Іп ог- 
tau qe complicating an already complex 
(ed calanalysis, it was necessary to equate 
E mbers of subjects either over the treat- 
eR dd у, or over the grade levels. The form- 
the smalle technique chosen since it involved 
Subjects er loss of subjects for both tests. All 
Decca ns first listed randomly, then the 
а table ie was eliminated according to 
Original random numbers. Table IV shows the 
maini Ns, the number eliminated, and the re- 
The. N for each class and test. 
Scores elimination of subjects changed the mean 
Варава class only slightly, which is the ап- 
jected result when subjects are random ly re- 
test re v the comparison of pre- and post- 
imental à ts there are 19 subjects in each exper- 
and 22 dien 16 subjects in each Control; class, 
5 in iu in Control, classes for PST, and 
number ор, ol, classes for the CT. The total 
240 fer Of Subjects will be 228 for the PST and 
e CT. 
desiree teacher training program has had the 
taught b results, we would expect that the pupils 
Breater У. ће experimental ізасһегв would show 
е Panis Ce in PST and CT scores than 
Use о e taught by the control teachers. If the 
Cant сос ШЕ materials alone has any signifi- 
Sseg » we would also expect the Control, 
Ses altho. improve more than the Control; clas- 
basic Ough, of course, this difference is not 
Siu. the evaluation of the training program. 
We worp Са analysis —In a design of this type 
ignifi expect to find some random (though 
amon; aiit differences in pre-test scores 
test Ë the treatment groups. Since these рге- 
Е ка егепсез may have some effect on the 
Inate t St Scores, it would be desirable to elim- 
nique SM by means of some statistical tech- 
Ure is a ence the appropriate statistical proced- 
n analysis of covariance. 
°ssary t data do not justify the assumptions nec- 
Ysig 2 ÍOr the application of a covariance ana l- 
Hirst or "ere remain two alternate analyses. The 
Scores hese is simply to accept the post-test 
the as 55 а valid index of treatment effects on 
"litte проп that the lack of significance of 
тұ ος among pre-test scores means that the 
he seo, еге actually equated prior to treatment. 
бозы Ond is a sign test (1) based оп preminus 
Qui; ЗЕ Scores, a non-parametric method re- 
t Results assumptions. p 
°5 geo The PST: Pre-Test—The mear pre 
vi res on the PST are shown in Table У. 
ously there are arithmetic differences be 


OJEMANN ET AL. 103 


tween class means, although the differences be- 
tween mean scores for the three treatments, 5.17 
for the experimental group, 5.86 for Control, 
and 5.38 for Control, are quite close together. 
The results of an analysis of variance of these 
scores are shown in Table VI. 

The analysis reveals no significant difference 
between the treatment means (F = 0.580) and no 
significant interaction (F = 1.915). Differences 
between grades are significant (F = 3.516, P = 
<.02>. 01) but this is of no consequence for the 
experimental design. Theabsence of differences 
between treatment means and the lack of inter- 
action indicate that random sampling has been 
accomplished. That is, the classes have been 
assigned at random to the treatment groups and 
are thus well matched. We may conclude that 
this phase of the testing with the PST has been 
successful, 

The PST: Post-Test—The post-test means 
are shown in Table ΥΠ. Thepre-test means 
of Table V are included for comparative pur- 
poses. 

The experimental group, witha pre -test 
mean of 5. 17, dropped to 2. 39 on the post-test. 
Control, fell from 5.86 to 5.14, a change ofless 
than three-quarters of a point. Control; rose 
slightly, from 5. 38 to 5. 67. The experimental 
classes show a unanimous decrease in mean 
score, the smallest decrease, that for the fourth 
grade, being over 1. 25 points. Three of the four 
classes in Control, show decrements, altho ugh 
the overall decrease is much less than that for 
the experimental group. Two of the Control, 
classes show increases and two show decreases, 
the net being an increase of 0. 29 points. 

We now proceed to an analysis of variance of 
the post-test scores, which is shown in Table 
VIII. 
We find on post-test that the difference be- 
tween treatment means is now highly signifi- 
3.4, Р-<.001). The differences by 


cant (F - 2 

grades remain significant, though of no conse- 
quence. The interaction also remains insignifi- 
cant. 


proceed to adjust the post-test 
scores by covariance, it is necessary to testfor 
homogeneity of regression, a key assumption in 
the application of covariance. For this test we 
break down the adjusted within ce lls sum of 
squares for the post-test scores into two compon- 
ents, the sum of squares for differences among 
group regression lines and the sum of squares for 
deviations from the group regression. The mean 
square for the former divided by the meansquare 
for the latter constitutes the F-ratio for the test 
of homogeneity of regression. The degrees of 
freedom are the number of regressions minus 
one for the numerator and N minus twice the 
number of regressions for the denominator. 


For the PST, the MS for differences among 


Before we can 


104 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE V 
MEAN PRE-TEST SCORES ON THE PROBLEM SITUATIONS TEST 


Grade Experimental Control, Control, Total 
Fourth 5.05 8.32 5.81 6.53 
Fifth 4.95 3.32 4.50 4.19 
Sixth (I) 4.74 7.00 6. 63 6.14 
Sixth (II) 5.95 4. 82 4.56 5.12 
Total 5.17 5. 86 5.38 5.496 


TABLE VI 


ANALYSIS OF VARIANCE OF PRE-TEST SCORES ON THE PST 


——— MM —— — Ó————— à 


Source d. f. SS MS E ἘΞ 
Treatments 2 20. 86 10. 430 0.580 > 20 
Grades 3 188. 89 62. 963 3.516 <.02 >.01 
тхс 6 205.78 34.297 1.915 <.10 5.05 
Within Cells 216 3868. 47 175,910 а 

Total 227 4284. 00 


TABLE VII 


MEAN PRE-TEST AND POST-TEST SCORES ON THE PST 


Experimental (Сошо —— — — 


Grade Pre Post Tre ντ pro! Post 
Fifth 4.95 2.32 3.32 3.05 4.50 5.13 4.19 3.39 
Sixth (I) 4.74 2.00 7.00 5.77 6. 63 6. 06 6.14 4. 60 
Sixth (II) 5.95 1.53 4.82 5.09 4.56 4.06 5.12 3.61 
Total 5.17 2.39 5. 86 


5.14 5.38 5.67 5.496 4.373 


x ac c c c cL M 4 пол ее че 


December, 1955) 


IR regressions is 29. 639, the MS for devia- 
lons from group regressions is 5.527. The F- 
rako is 5.363, which is significant below the 
о level for d.f.'s of 11 and 204. We thus re- 
Ject the null hypothesis and conclude that hetero- 
&eneity of regression exists among the cells. 
al Regretfully, we are forced to abandon the an- 
um of covariance. Simply for purposes of 
Gore eteness, it might be noted that the covari- 
Ῥ- e analysis would not have changed any of the 
Е values in Table VIII very much. 
nn ier the heading ''Statistical analysis’’, two 
that и procedures were suggested inthe event 
ate d covariance analysis proved inappropri- 
aids he first of these was to interpret the an- 
reci in Table VI, which shows no significant 
hohe differences between treatment groups, 
ed ing that the treatment groups were equat- 
ES the pre-test. Statistically, this is literal- 
don ue since the arithmetic differences are гап- 
Sido) With this interpretation, we may now con^ 
Show, the analysis of the post-test scores, as 
{уре ща Table VIII, as our critical test. This 
jn ar experimental analysis is quite common, 
Ta ea common than covariance. 
σοι е VIII shows that the differences among 
tesa and among grades are significant, the 
dears Е ive Fs being 23.4 and 7.625, both signif- 
ot si eyond the .001 level. The interaction is 
Gone The next step is to test the dif- 
Mie ee pairs of treatment means. These 
or th will be found in Table VII. Theyare2.39 
and 5 “y perimental group, 5.14 for Control, 
ences for Control,. The t-tests of differ- 
S between means are shown in Table IX. 
laser experimental group is clearly lower in 
Control 076 than either of the two controls. The 
Other groups, however, do not differ from each 
individ This general finding is also true of the 
each, ual class means as shown in Table ΥΠ. In 
ag s the. experimental class has the lowest 
trol, ^ In three of the four grade levels, Con- 
пер lower than Control, though the differ- 
are numerically small. 
each 8 Second suggestion wasa sign test in which 
Compe zit of pre-test and post-test scores аг e 
indienne for direction of change. A plus would 
he рг that the post-test score was larger than 
Chan, €-test, a minus the reverse, anda zero, no 
c A chi-square is then applied to the fre- 


33. e. Chi-square obtained from Table X is 
for io! Which is significant below the . 0001 level 
Cleary degrees of freedom. The results are 
has the in favor of the experimental group which 
n е largest number of minuses andthe small- 
larg Umber of plusses. Control, has the next 
езі number of minuses and the next smallest 


OJEMANN ET AL 105 


number of plusses. The trend is revealed more 


clearly by breaking down Table X into its three 
individual chi-squares, of which only two need 
be computed for our purposes. Comparing the 
experimental group with Control,, we obtain a 
chi-square of 13.13, which is significant below 
the . 005 level for d.f. = 2. Comparing Control, 
with Control,, the chi-square is 8.21, d.f. = 2, 
and Р = <.02>.01. In other words, the exper- 
imental group appears to have been most affect- 
ed by the treatment, Control, next most affect- 
ed and Control, least affected. Control, in fact 
shows almost exactly the number of minuses that 
would be expected by chance alone. 

Reliability of the PST—An estimate of test- 
retest reliability can be obtained by correlating 
the pre- and post-test scores for Control,, the 
untreated control group. For this purpose we 
can utilize data from the abandoned covariance 
analysis, a procedure which will provide anover~ 
all r with systematic differences among grade 
means eliminated. 

The test-retest correlation turns out to be.71. 
This is a respectable reliability witha group 
which includes a fair smattering of 9-year-olds. 
Furthermore, the hiatus between testandre-test 
was over six months and it is customarily no 
more than a week or two for test-retest reliabil- 
ities. The unusually long gap ordinarily has a 
tendency to attenuate the correlation. 

The CT: Pre-Test— The mean pre -test 
scores for the CT are shown in Table XI. 

The analysis of variance of the pre-test scores 
is shown in Table XII. 

As in the case of the PST, the variance due 
to treatments is insignificant (Е = 1. 734) while 
that for grades is significant (Е = 11.146, P= 
<.001). However, for the CT the interaction 
variance is also significant (Е = 2.752, P =<. 03), 
a disturbing occurrence, since it indicates that 
the sampling of classes is non" ra ndom. The 
source of the interaction seems obvious;the 
means for fourth and fifth grades in Control, and 
for the experimental sixth grade (I) are atypical 
when compared with means in the same level or 
in the same treatment. 

Lack of randomness is not unexpected when 
intact school classes are assigned to treatments. 
It is an unfortunate happening in factorial design, 
but in this particular case we need not yet be 
overly concerned. If the treatment effects are 
exceedingly strong, it is quite possible that the 
adjusted post-test scores will not have a signifi- 
cant interaction. In that case all will be well. 
H, however, the interaction effect remains, then 
the within cells MS will no longer be the appro- 
priate error term for testing the main effects 
and the analysis will be very coarse and prob- 
ably unrevealing. 

The CT: Post-Test— The post-test means 
are shown in Table XIII. The pre-test means 


106 I І JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VIII 


ANALYSIS OF VARIANCE OF POST-TEST SCORES ON THE PST 


Source d. f. SS MS F P 
Treatments 2 456. 68 228.340 23.400 « .001 
Grades 3 223.30 74. 400 7.625 « .001 
TxG 6 81.61 13. 602 1.394 2.20 
Within Cells 216 2107.82 BUS. | 2.42 

Total 227 2869. 31 


TABLE Ix 


COMPARISONS OF TREATMENT GROUPS ON THE 
PST POST-TEST 


------------------------------------------ 


Comparison t P 
Experimental - Control, 5.68 < .0001 
Experimental - Control, 6.20 < .0001 
Control, - Control; 1.03 30 


TABLE X 
SIGN TEST ANALYSIS OF PRE- MINUS POST-TEST SCORES 
ON THE PST 
Plus Minus Zero Total 

Experimental 6 57 13 76 
Control, 27 48 13 88 
Control, 31 20 13 64 
Chi-square = 38. 46; d.i. -4;р--. 0001 


(Vol. 24 


Dec 
ember, 1955) OJEMANN ET AL. 


TABLE ΧΙ 


MEAN PRE-TEST SCORES ON THE CAUSAL TEST 


Grade Experimental Control, Control; Total 
Fourth 12.53 15.76 14.50 14.40 
Fifth 12. 05 9.24 12. 63 11.03 
Sixth (I) 8. 42 11.48 11.13 10. 42 
Sixth (II) 10.84 10. 68 10.88 10.78 
Total 10. 96 11.79 12.28 11.658 


o o _ у ш = ж ———ү— 


TABLE ХП 


ANALYSIS OF VARIANCE OF PRE-TEST SCORES ON THE CT 


Возна d.f. 85 MS Е Р 
Treatments 2 63.57 31.785 1.734 2.20 5.10 
бады 3 612. 88 204,293 11.146 <.001 
тхе 6 302. 64 50. 440 2.152 < .03 
Within Cells 228 4178. 89 18.328 e 
án 239 5157196: EET 

TABLE ХШ 


MEAN PRE-TEST AND POST-TEST SCORES ON THE CT 


. Control . Total 


Fourth лс қз 5а нм мм 12.56 14.40 10.28 
Fifth жо аб ον We € 63 11.25 11.03 6.92 
Sixth (1) ва ің dia 5 11.13 10.44 10.42 7.85 
Sixti(m ом 6.88 10.68 950 lO 88 6.48 10.78 1.17 
Tota] 10.96 4.63 11-79 9.08 12.28 10.28 11.658 8.054 


ы ae 


108 JOURNAL OF EXPERIMENTAL EDUCATION 


are again included for comparative purposes. 

All twelve of the classes show some decre - 
ment, allthree treatment groups show reduc- 
tions in mean score. Control, fell 2. 00 points, 
Control, 2.71 points, while the experimental 
group dropped over six points, a dec rease of 
more than 55 percent. The analysis of variance 
of post-test scores is shown in Table XIV. 

The variance due to interaction is clearly still 
significant (F = 5.031, Р-<.001). F-ratios 
and P-values for treatments and grades were 
computed using both the within cells MS and the 
T x G MS as error terms. The treatments MS 
is significant in either case, the respective Fs 
being 40.040 and 7.958, the respective Ps, < 
.001 and« . 025. 

The significance of differences among treat- 
ment groups is encouraging, but the persistent 
interaction is still a problem. Thereis not much 
point in testing for homogeneity of regressionun- 
til we determine whether or not the interaction 
will remain significant when it is adjusted by co- 
variance. Accordingly, the adjusted interaction 
MS and the adjusted within cells MS were com- 
puted. The results are shown in Table XV. 

The interaction remains significantevenafter 
adjustment, the F-ratio being 7.956, Р = <.001. 
This means that the within cells MS is no longer 
an appropriate error term for testing the main 
effects. The design would be left with only 10 de- 
grees of freedom, 2 for treatments, 3 for grades, 
and 5 for interaction (since 1 d.f. 15 lostfrom 

the error term due to adjustment). Such ап an- 
alysis could hardly be expected to provide signif- 
icant results unless the treatments were practic- 
ally infinitely powerful. One would hardly con- 
Sider undertaking an experiment with only three 
Scores in each treatment group. 

Rather than forego the increased sensitivity 
of design offered by the within cells error, the 
data were inspected in the hope of discover ing 
the source of the significant interaction. An ex- 
amination of the data in Table XIII revealed that 
the sixth grade (II) class in Control, had dropped 
significantly on the post-test. Its pre-test mean 
was 10. 88 and its post-test mean was 6.88. The 
t-score of the difference is 4.65, which is sig- 
nificant beyond the . 01 level for 14 d.f. The dif- 
ference of 4. 00 points is more than twice that 
for any other class in Control; and greater than 
that for any class in Control,, the treated con- 
trol group. This class evidently contributes a 
considerable amount to the significance of the 
interaction. It does not seem conceivable that a 
single untreated control class should show asig- 
nificant decrement. It is probable that this 
class had been exposed to some uncontrolled 
“treatment” during the course of the six months 
intervening between pre- and post-tests.4 It 
was decided that sufficient grounds existed for 
dropping out this entire level from the analysis 


(Vol. 24 


proper, if for no other reason than to deter- 
mine statistically if this single class was, infact, 
accountable for any large part of the interaction. 
The recomputed analysis of pre-test score 5 
based оп 9 clasSes and 180 subjects shown in 
Table XVI. 

The results are almost identical with those 
of the original analysis of pre-test scores in 
Table XII. Treatment variance is insignificant 
and variances due to grades and interaction are 
Significant. As in Table XII, if the T x G MS is 
used as the error term, the grades variance al- 
so becomes insignificant. : 

The analysis of post-test scores is shown in 
Table XVII. 

Once again the results are practically un- 
changed. Except for minor discrepancies in P7 
values, Table XVII shows the same kind of data 
as did Table XIV, the original post-testanalysis. 
All three effects are significant when tested by 
the within cells error; the treatment variance ге” 
mains significant when T x G is the error term, 
while the grades variance becomes insignificant. 

So far, the elimination of the sixth grade (II) 
level has not changed the analysis. The next 
step is to adjust the within cells MS and the TX 
G MS by covariance, as in the original analysis. 
The computations are shown in Table XVIII. 

The F-ratio for the interaction is now only 
0.994, which is clearly insignificant. Acompar- 
ison of the results in Table XVIII with those of 
the original analysis 1n Table XV reveals the 
marked effect of the elimination of the sixth 
grade (II) Control, class. No other result is 
changed, but the interaction goes from highly 
Significant to insignificant. 

Homogeneity of regression must stillbedem~ 
onstrated before we can proceed to make the сгч- 
cial adjustments of the treatment means. The 
MS for differences among group regressions iS 
14. 093 and the MS for deviations from group те 
gression is 8.577. The F-ratio for Ше test is 
14. 093/8. 577 = 1.643. The P-value is <.20 > 
-10 for 8 and 162 degrees of freedom. Hence, We 
may accept the null hypothesis and conclude that 
the cells have homogeneous regressions. " 

Having demonstrated homogeneity of герге5 
Sion, we may now proceed to adjust the sums of 
Squares for treatments and grades for the cru 
cial test. The adjusted data, plus the data of 
Table XVII, are shown in Table XIX. 

The adjusted variance for treatments yields 
an F-ratio of 53.933, which is significant be^ 
yond the .0001 level. The F for grades is 4.024, 
P-.02. There can be по doubt but that the 
treatment variance is significant in the final an^ 
alysis and we must conclude that there have 
been real treatment effects during the six and 
one-half months intervening between pre- and 
Post-testings. By comparing the unadjusted 
within cells MS (14. 828) with the adjusted with- 


Dece 
Mber, 1955) OJEMANN ET AL. 


TABLE XIV 


ANALYSIS OF VARIANCE OF POST-TEST SCORES ON THE CT 


0. ЛЕ 5  ---- 
F 

Source ы ss MS vn int. Ка 

Treatments 2 1213.22 606. 610 40. 040 7.958 <.001 <.025 

Grades 3 425. 55 141.850 9.363 1.861 <.001 .>.20 

TuS 6 457.37 16.228 5. 031 жазы 7500102 

Within Cells 228 3454. 16 15.150 τόνε» 

Total 239 5550. 30 ος 


νιν == мо 


TABLE XV 


OF THE POST-TEST INTERACTION ON THE CT 


COMPUTATION 
ADJUSTED BY COVARIANCE 


Source ^a SS MS F P 
TxG 6 440.21 13.368 7. 956 <.001 
Within Cells 227* 2093. 39 9,222 «twin 


*One degree of freedom lost due to adjustment 


TABLE XVI 


ж Е5 ОМ ТНЕ СТ 
ANALYSIS ОҒ VARIANCE OF PRE-TEST SCOR 
WITH GRADE LEVEL 6 (1) ELIMINATED 


MS Е P 


Treatments 2 85.47 
Grades 2 551. 63 215. 815 15. 458 <.001 
тха 4 280.27 70, 068 3.927 2005 
Within Cells 171 3051. 18 PBA smaa 

3988.5 ee 


109 


108 JOURNAL OF EXPERIMENTAL EDUCATION 


are again included for comparative purposes. 

All twelve of the classes show some decre- 
ment, all three treatment groups show reduc- 
tions in mean score. Control, fell 2. 00 points, 
Control, 2.71 points, while the experimental 
group dropped over six points, a dec rease of 
more than 55 percent. The analysis of variance 
of post-test scores is shown in Table XIV. 

The variance due to interaction is clearly still 
significant (F = 5.031, P=<.001). F-ratios 
and P-values for treatments and grades were 
computed using both the within cells MS and the 
T x G MS as error terms. The treatments MS 
is significant in either case, the respective Fs 
being 40. 040 and 7.958, the respective Ps, < 
.001 and< . 025. 

The significance of differences among treat- 
ment groups is encouraging, but the persistent 
interaction is still a problem. There 15 not much 
point in testing for homogeneity of regression un- 
til we determine whether or not the interaction 
will remain significant when it is adjusted by co- 
variance. Accordingly, the adjusted interaction 
MS and the adjusted within cells MS were com- 
puted. The results are shown in Table XV. 

The interaction remains significant evenafter 
adjustment, the F-ratio being 7. 956, P = <.001. 
This means that the within cells MS is no longer 
an appropriate error term for testing the main 
effects. The design would be left with only 10 de- 
grees of freedom, 2 for treatments, 3forgrades, 
and 5 for interaction (since 1 d.f. is lost fÉrom 
the error term due to adjustment). Suchanan- 
alysis could hardly be expected to provide signif- 
icant results unless the treatments were practic- 
ally infinitely powerful. One would hardly con- 
Sider undertaking an experiment with only three 
Scores in each treatment group. 

Rather than forego the increased sensitivity 
of design offered by the within cells error, the 
data were inspected in the hope of discover ing 
the source of the significant interaction. Ап ех- 
amination of the data in Table XIII revealed that 
the sixth grade (II) class in Control, had dropped 
significantly on the post-test. Its pre-test mean 
was 10. 88 and its post-test mean was 6.88. The 
і-всоге of the difference is 4.65, which is sig- 
nificant beyond the . 01 level for 14 9.1. Тһе dif- 
ference of 4.00 points is more than twice that 
for any other class in Control, and greater than 
that for any class in Control,, the treated con- 
trol group. This class evidently contributes a 
considerable amount to the significance of the 
interaction. It does not seem conceivable that a 
single untreated control class should show asig- 
nificant decrement. It is probable that this 
class had been exposed to some uncontrolled 
“treatment” during the course of the six months 
intervening between pre- and post-tests.4 It 
was decided that sufficient grounds existed for 
dropping out this entire level from the analysis 


proper, if for no other reason than to deter- 
mine statistically if this single class was, infact, 
accountable for any large part of the interaction. 
The recomputed analysis of pre-test score 5 
based on 9 clasSes and 180 subjects shown in 
Table XVI. 

The results are almost identical with those 
of the original analysis of pre-test scor es in 
Table XII. Treatment variance is insignificant 
and variances due to grades and interaction are 
significant. As in Table ΧΙΙ, if the T x G MS is 
used as the error term, the grades variance al- 
so becomes insignificant. Ж 

The analysis of post-test scores is shown іп 
Table XVII. 

Once again the results are prac tically un" 
changed. Except for minor discrepancies in P 
values, Table XVII shows the same kind of data 
as did Table XIV, the original post-testanalysis. 
All three effects are significant when tested by 2 
the within cells error; the treatment variance ге 
mains significant when T x G is the error term, 
while the grades variance becomes insignificant. 

So far, the elimination of the sixth grade (II) 
level has not changed the analysis. The next 
step is to adjust the within cells MS and the T X 
G MS by covariance, as in the original analysis. 
The computations are shown in Table XVIII. 

The F-ratio for the interaction is now only | 
0.994, which is clearly insignificant. Acompar 
ison of the results in Table XVIII with those of 
the original analysis in Table XV reveals the 
marked effect of the elimination of the sixth 
grade (II) Control; class. No other result is 
changed, but the interaction goes from highly 
Significant to insignificant. 3 

Homogeneity of regression must still be аеш 
onstrated before we сап proceed to make the а 
cial adjustments of the treatment means. Th А 
MS for differences among group regressions sas 
14. 093 and the MS for deviations from group я 
gression is 8,577. The F-ratio for the test iS 
14. 093/8.577 = 1.643. The P-value is <.20 с 
-10 for 8 and 162 degrees of freedom. Hence, 
may accept the null hypothesis and conclude 
the cells have homogeneous regressions. es- 

Having demonstrated homogeneity of reg r Ж 
sion, we may now proceed to adjust the pemr 
squares for treatments and grades for the е 
cial test. The adjusted data, plus Ше da ta 
Table XVIII, are shown in Table XIX. ids 

The adjusted variance for treatments ye 
an F-ratio of 53.933, which is significant кеі 
yond the . 0001 level. Тһе Ffor grades is 4.024, 
Р = .02. There can be no doubt but that ORF е 
treatment variance is significant in the fina д 
alysis and we must conclude that there hav i 
been real treatment effects during the six mt 
one-half months intervening between pre-a d 
Post-testings. By comparing the unadj uS ine 
within cells MS (14. 828) with the adjusted wi 


(Vol. 24 


————— M ——— — —— w 


x 


D 
ecember, 1955) OJEMANN ET AL. 


TABLE XIV 


ANALYSIS OF VARIANCE OF POST-TEST SCORES ON THE CT 


a ae 
P 


F 
Source d. £. ss MS πα το Мий OEC 
Treatments 2 1213. 22 606. 610 40. 040 7.958 <.001 <.025 
Grades 3 425. 55 141. 850 9. 363 1.861 <.001 >.20 
TxG 6 457.37 76.228 οσα ues <.07. 
Within Cells 228 3454. 16 [555 22... 
Total 239 5550.30 — α....... 


TABLE XV 


COMPUTATION OF THE POST-TEST INTERACTION ON THE CT 
ADJUSTED BY COVARIANCE 


Source ал. 55 MS Е Р 
тже 6 440.21 т3.368 т. 956 <.001 
Within Cells 291» 2093. 39 δ... εὐον» 


*One degree of freedom lost due to adjustment 


TABLE XVI 


ANALYSIS OF VARIANCE OF PRE-TEST SCORES ON THE CT 
WITH GRADE LEVEL 6 (П) ELIMINATED 


42.135 2.395 .10 


Treatments 2 85. 47 
Grades 2 551. 63 215. 815 15. 458 <.001 
TxG 4 280.27 70. 068 3.927 2005 
Within Cells 171 3051. 18 17.849 λες 

Total 179 3968.55 ......: 


a 


110 JOURNAL OF EXPERIMENTAL EDUCATION 


in cells MS (8. 837) we see that the covariance 
analysis has nearly doubled the precision of the 
tests of the main effects, a result which makes 
the required time and effort well worthwhile. 
We can now be certain that the treatment ef- 
fects are significant, but we do not yet know 
which group or groups account for the signif- 
icance. To investigate this point wefirstadjust 
the cell and treatment means and then apply t- 
tests to individual pairs. The adjustment of 
means is accomplished by the use of a regres- 
sion equation which is derived from the covari- 
ance analysis. 

Table XX lists the adjusted means for each 
class and for the various treatment groups. 
These means have had the effects of the respec- 
tive pre-test means eliminated from them and 
will thus stand alone for comparison with each 
other without reference to the pre-test means. 

The next step is compute t-tests for compar- 
isons of pairs of treatment means. The three 
t-tests results are shown in Table XXI. 

7 The t-tests reveal a clear-cut trend; the ex- 

perimental group has the lowest adjusted mean 
score, differing significantly from both control 

groups; Control, has a significantly lower mean 
than Control;. Returning to Table XX, we see 
that this trend holds true for all of the grade lev- 
els as well as for the treatment means. 

The main analysis is now complete. It will 
be discussed in the next section. 

Reliability of the CT— Тһе test-retest relia- 
bility of the CT is . 73. The remarks concern- 
ing the reliability of the PST also apply here. 

Discussion and Conclusions — There is little 
doubt that the classes of the experimentalteach- 
ers showed a marked change on both measures 
when compared with the classes of the control 
teachers. The statistical difficulties—the lack 
of homogeneity of regression for the PST and 
the peculiar interaction effect for the CT—do 
not obviate the large experimental-control differ- 
ences. 

It is perhaps unfortunate that the covariance 
analysis was inapplicable to the PST. There is, 
however, a plausible explanation for the hetero- 
geneity of regression which precluded its use. 
Examination of Table VII shows that the post- 
test means for the experimental classes, espec- 
іа Пу the fifth grade and the sixth grades, are 
perilously near the ceiling (i. e., the lowest pos- 
sible score) of the test. This means that con- 
siderable number of subjects obtained the same 
Scores, mostly in the range 0-2. Since no one 
could improve beyond a score of zero, many of 
the subjects whose pre-test scores varied 
achieved а common post-test score. This tends 
to attenuate the pre-post correlation. This was, 
of course, not true for the control groups. The 
net result was that the experimental classes 
showed different regressions than the controls. 


The correlation between pre- and post-test 
scores was .74 for Control, subjects, .71 for 
Control, subjects, but only . 44 for the experi- 
mental group. 

Evidently the PST is an inadequate index for 
the pre-post type of experimental design since 
(a) the mean PST pre-scores are too low, and 
(b) the ceiling of the test is then too close to the 
pre-scores to permit adequate discrimination 
among members of experimental groups. 

The performance of the Control, classes is 
not easily evaluated. This group of teachers 
was made acquainted with various teaching aids 
used by the experimental teachers and was per- 
mitted to use any that they wished in any fashion 
and for any amount of time. Only the briefest 
instructions concerning use of the materials 
were given since it was felt that such instruc- 
tions fell within the province of the teacher train- 
ing program. The purpose of the inclusion of 
Control, was to attempt to determine the effects 
of using the teaching materials uninstructed as 
opposed to the effects of the training program. 
The results are somewhat ambiguous. Two of 
the three analyses show that the Control, sub- 
jects improved significantly more than Ше whole- 
ly untreated Control, though not as much as the 
experimental subjects. Individual t-tests based 
on adjusted CT scores and individual chi-squares 
based on the sign test of PST results reveal this 
trend. The t-tests derived from the analysis of 
variance of PST results do not show this trend. 

In an attempt to investigate this point further, 
the teachers in the experimental group and in 
Control, were asked to estimate the amount of 
class time spent in the use of the teaching aids. 
The amount of time in hours for each teacher i$ 
shown in Table XXII. 

The experimental teachers used the teaching 
materials much more than did the control teach" 
ers, as would be expected. The experimental 
teachers also varied only slightly among the m^ 
selves, as evidenced by the average deviation 
from the mean of 1.5 and the mean of 34 hours: 
On the other hand, the control teachers varie 
considerably, the average deviation being 4. 
and the mean 7 hours. 

From the data in Table VIII we saw that there 
was no interaction between treatments and grades 
on the PST although both effects were significant. 
The same conclusions apply to the CT (Table 
XIX) when the sixth grade (II) level was elimin” _ 
ated. The lack of interaction means that the dif 2 
ferences between experimental and Control, cl29 
Ses were about the same from grade level to — 
grade level despite the significant overall differ 
ences in grade level scores. The difference 
between the experimental fourth grade class 
and the Control, fourth grade class is about the 
Same as the difference between the two fifth 
grade classes, and so on. 


(Vol. 24 


eS —— u. ди а ы. 


— | 


December, 1955) OJEMANN ET AL. 


TABLE XVII 


ANALYSIS OF VARIANCE OF POST-TEST SCORES ON THE CT WITH 
GRADE LEVEL 8 (II) ELIMINATED 


111 


SS 


P 
Source d.t. SS MS suu m Within — Int. 
Treatments 2 1341. 72 670. 860 45.243 11.642 <.001 <.025 
Grades 2 362. 53 181. 265 19.225 3.148 <.001 <.2>.1 
TRG 4 231. 08 57. 626 did Т” 700574 962. 
Within Cells 171 2535. 62 14,480 22.2. 
Total 179 41095 ....... 


μμ  '''. eee eee 


TABLE XVIII 


COMPUTATION OF THE POST-TEST INTERACTION ON THE CT WITH GRADE 
LEVEL 6 (II) ELIMIN ATED, ADJUSTED BY COVARIANCE 


tO ——— P eee 
P 


Source 9 d. f. ss MS H P 
ха 4 35.13 8.738 0. 994 >.20 
Within Cells 170* 1502. 25 КС МЕТІ 


*One degree of freedom lost due to adjustment 


TABLE XIX 


-TEST CT SCORES WITH 
ANALYSIS OF COVARIANCE OF THE POST 
GRADE LEVEL 6 (1) ELIMINATED 


ea 
Е Ρ 


Source d. f. SS MS = P 
Treatments 2 953.21 476. 605 53. 933 <.0001 
Grades 2 11.12 35.560 4.024 ‚02 
Тха 4 35. 13 8.738 0,994 >.20 
Within Cells 170* 1502.25 8.837 


“Оле degree of freedom lost due to adjustment 


112 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XX 


POST-TEST MEAN SCORES ON THE CT, 
ADJUSTED BY COVARIANCE 


Grade Experimental Control, Control; 
———————————————2 


Fourth 5.192 10. 223 11.076 
Fifth 4. 572 7.457 10. 854 
Sixth 5.634 9.714 10.917 
Total 5.132 9.131 10. 951 


———————————————————————— 


TABLE XXI 


COMPARISONS OF TREATMENT GROUPS ON THE CT ADJUSTED 
POST-TEST SCORES 


——— PU s p P.ÜƏ.Əəa.. O, 
Comparison 


t P 
Experimental - Control, 7.603 < .0001 
Experimental - Control, 9.793 < . 0001 
Control, - Control, 3.315 001 


TABLE XXII 


CLASSROOM HOURS SPENT USING TEACHING AIDS 


А. 


Sixth Sixth Average 

Group Fourth Fifth (1) (II) Mean Deviation 
Experimental 33 33 37 33 34 1.5 
бошу, 14 2 9 3 1 4.5 


^ CC C ЕЕ ТЕ DEM ы ОА πε ны ырын лыы ан АБ, 


December, 1955) 


a The data in Table XXII show that the experi- 
Күр classes were all treated approximately 
Gace respect to number of hours of use of 
ταση q. aids. The control classes, however, 
— from 2 hours to 14 hours. If the use of 
ans aids alone had any real effects, we 
ences "v dad to have found a variation in differ- 
Ca etween pairs of classes. The two fourth 
45 üt for example, should not differ as much 
grade two fifth grade classes since the fourth 
much омо, teacher spent seven times as 
grade ime with teaching aids as did the fifth 
=s cobs teacher. This variation would 
Ben ee in a significant interaction be- 
teracti reatment and grades. Since no such in- 
CT afters Were found with the PST or with the 
We con т elimination of the sixth grade (П) level, 
ing the t ude that amount of class time spent us- 
Sole fa eaching materials was probably not the 
even oe making for reduction intest scores, 
a Significa accept the conclusion that there was 
i Cas change in the scores of the subjects 
Conclusa classes. That we should accept this 
matter ion is still open to question, as is the 
of what factor did influence the scores 
е subjects in Control, if we do not acceptit. 
With the gone as far as we can reasonably go 
able to Present analysis. Data arenot avail- 
id Settle either question satisfactorily. 
меге (Ray — The subjects of this investigation 
each s classroom teachers and their pupils, 
The tea SSroom matched with two control groups. 
бзге re participated in a training program 
“анов 10 extend their understanding andappre- 
for gro Of child behavior, to provide opportunity 
ορ шени in personal adjustment and to devel- 
ticular ods for teaching causally oriented cur- 
content. 
Multiple of the child's awareness of the complex 
9f his i causative nature of human behavior and 
ines to immediate punitiveness were 
8roups | ered to both experimental and control 
арро the fall of the school year and again 
Ezten cely 6-1/2 months later. 
Para ended analyses of the results using both 
mae and non-parametric methods, as the 
Ὁ the data indicated, were applied. 
Showeg Classes of the experimental teachers 
езуц distinctly significant changes on the two 
the eo, 8 used when compared with classes of 
t ntrol teachers. 
9f the e appears that when we bring children 
influen Per elementary grade levels under the 
Causa, CE Of causally oriented teachers teaching 
᾽ in епк we bring about significant differ~ 
Ted | "4 Child's growth in the aspects meas" 
Ада: Study. 
Snte ditional differences between causally ori- 


they pap control subjects will be reported in 
rs, 


nature 


10. 


OJEMANN ET AL. 113 


REFERENCES 


. Dixon, W. H. and Massey, F. J. Introduc- 
tion to Statistical Analysis (New York: Mc- 
Graw-Hill Co., 1951). 

. Levitt, E. E. ‘‘Punitiveness and ‘Causality 
in Elementary School Children, " Journal 
of Educational Psychology (in press). 

Levitt, E. E. and Lyle, W. H. ‘‘Evidence 
for the Validity of the Children’s Form of 
the Picture-Frustration Study, " Journal 
of Consulting Psychology, 1955 (in press). 

Levitt, E. E. and Ojemann, R. Н. “Тһе 
Aims of Preventive Psychiatry and ‘Caus- 
ality’ as a Personality Pattern, '' Journal 
of Psychology, XXXVI (1953), pp. 393-400. 

Lindquist, E. F. Design and Analysis of 
Experiments (New York: Houghton Mifflin 
Co., 1953). 

Lyle, W. H. and Levitt, E. E. ‘‘Punitive- 
ness, Authoritarianism and Parental Dis- 
cipline of Grade School Children, " Jour- 


nal of Abnormal and Social Psychology, 


1955 (in press). 

Ojemann, R. Н. “Ап Integrated Planfor Ed- 
ucation in Human Relations and Mental 
Health, ’’ Journal of National Association 
of Deans of Women, XVI (1953), pp. 101- 
108. 

Stiles, Frances S. A Study of Materials and 


Programs for Developing an Understa n d- 
ing of Behavior at the Elementar School 
Level, Ph.D. Dissertation, University of 
Iowa, 1947. 

Stiles, Frances S. * Developing an Under - 
standing of Нитгп Behavior at the Ele- 
mentary School Level, ” Journal of Ed- 
ucational Research, XLIII (1950), pp. 516- 


524. 


Zelen, S. L. Effectofa Causal Learning 
Program, Mimeographed Report, Preven- 
tive Psychiatry Project (Iowa City: State 


University of Iowa, 1954). 


, 


FOOTNOTES 


All of the analyses of variance computed for 
this report are based on Lindquist’s pro- 
cedures (5). ‘‘Treatments” refers to the 
three primary groups, the experinrental 
and the two controls. “T xG” is the 
treatment-by-grades interaction. The 
«within cells" mean square is the over- 
all standard error when other systematic 
differences have been eliminated. If the 
interaction is not significant, the mean 
square for within cells is the appropriate 
error term for testing the effects. Inthe 
remaining tables of this kind, “sum of 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


squares’’ will appear as SS and ‘‘mean 
Square"! as MS. 


2. The assumptions necessary for the use of co- 


variance will be found in (5). 


3. Strictly speaking the within cells MS is not 


the appropriate error term in Table XII, 
although it was used there to test the main 
effects. If the T X G MS, which is really 

the appropriate error term, hadbeen used, 
the respective F-ratios for treatments and 
grades would be 0. 636 and 4. 05, the latter 
falling just short of the . 05 level of signif- 
icance. Since the treatments MS simply 

remains insignificant and since we are not 


interested in grade differences, it is im- 
material which error term is used. 


4. A lengthy interview with the teacher by the 
experimenter most familiar with her 
(Whiteside) did not provide any clues as to 
what this uncontrolled factor might have 
been. 


5. The interaction did not actually involve the 
Sixth grade (II) class in Control, but was 
rather a function of this class in Control;. 4 
Comparing only the experimental and Con- ; 
trol, groups, there was no significant in- 
teraction even with the sixth grade (II) 
class included. 


THE SELECTION OF CANDIDATES FOR 
TEACHER EDUCATION AT THE 
UNIVERSITY OF WISCONSIN 


GUSTAVE JOHN STOELTING * 
Milwaukee Public Schools 


SECTION I 


BACKGROUND OF THE PRESENT 
INVESTIGATION 


A. Basic Principles in Screening of Candidates 
for Teaching 


“Жын NEW screening procedures һауе 
ons E be used by teacher training institu- 
uring the last decade inan effort to se- 
= ο. Students capable of becoming super- 
Schone This seems important if our 
Ў, аге to have competent leadership. То- 
mploy majority of teacher-education schools 
mi ей, Some form of screening of persons ad- 
or educated as teachers. 
сащ Strong interest in teacher selection 
S out of a three-fold need: 


e 
Aris 


MM To maintain high standards in the pro- 
Overcome h time when emergency measures to 

€ teacher shortages may permit stand- 
ds to fall. ges y p 


“ To find more individuals equal to the 

asingly complex task of teaching. 

dividua TO prevent the wastes created when in- 
еу are 21е trained for positions for which 

fieg 6 Personally or intellectually not qua li- 


incre 


Уеа he teacher shortages of the past twelve 
that p Ve given rise to emergency measures 
divig ermit large numbers of poorly qualified in- 
trai s S to become teachers. While teac her 
tiong ПЕ institutions and professional organiza- 
Progra Уе attempted to overcome shortages with 
er Mms of recruitment, merely encouraging 
+ Wate numbers to become teachers is notanad- 
teacher O lution to the problem. То have good 
amoun 5 it is necessary to exercise a certain 
ἔξουρς 9f selection from among the larger 
десе 4 interested individuals who choose to 
Sr va; Cachers, The result has beena great- 
Vides Of screening devices and their more 
read use 


Зу p" literature on teacher selectionrepeat- 
Public *eSses the importance of protecting the 

Masi, же re through careful selection of сап- 
for teacher education. New concepts of 


he 
a 
Tor har Wishes to express his appr 
Pful criticisms and suggestions 


eciation to Dre 
in the planning and c 


learniug and development of young individuals 
combine to make classroom managementa highly 
intricate procedure. The necessity of helping 
young people understand a complex environment 
and the demands that it makes on the individual 
emphasize further the necessity for more com- 
petent teachers. Individuals with specific qual- 
ities are frequently called to meet the require- 
ments of special situations. Stiles (30) sum - 
marizes this point as follows: 


Superficial consideration might lead 
one to believe that democratic princi- 
ples would compel institutions to admit 
all who desire to become candidates for 
teacher education. More objective 
thought, however, would help one to 
realize that since education is a func- 
tion of the state and is maintained for 
its own good, therefore, the state has 
not only the right but also the responsi- 
bility to secure the best possible teach- 
ers. Merely providing state institutions 
of higher learning with competent pro- 
fessors and adequate curricula will not 
assure the state that superior teachers 
will be developed. The type of teacher 
that the university or teachers college 
will ultimately produce is dependent up- 
on the quality of persons whoareaccept- 


ed for training. 


Much of what has been done in construction 
of devices for screening of teacher candidates 
has been based on investigations of factors in- 
volved in successfulteaching. Thefactors most 
commonly used in screening teacher candidates 
are intelligence and scholastic achievement. La 
Duke (19), Rostker (26), and Seagoe (27) inde - 
pendently investigated the relationship between 
intelligence and success in teaching. They found 
a significant, positive relationship. As meas- 
ures of intelligence have become generally more 
reliable, the use of this device for screening of 
teacher candidates has become almost universal. 

Lins (20) and Stuit (31) provide data on the 
relationship between scholastic achievementand 
success in teaching. Most teacher training in- 
stitutions today specify a minimum of scholas- 
tic achievement as a part of their program of 


с. Se Liddle, Dre Ge Ge Eye, and Dre A. Se Barr 
out of the study. 


116 M JOURNAL OF EXPERIMENTAL EDUCATION 


Screening for teacher candidates. 

In addition to using intelligence and scholas- 
tic achievement as measures for predicting fu- 
ture teaching success, some teacher training in- 
stitutions are using also less well known m eas- 
ures of yet other factors in teaching success. 
Flanagan (41) and Seagoe (27) provide data sup- 
porting the use ofa generalculture testfor screen- 
ing prospective teachers. The importance of pro- 
ficiency in reading and speech as vital qualities 
of the successful teacher is supported by d2ta 
provided by Flanagan (11), Henrickson (14), and 
McCoard (21). 

Personality as a factor in teacher success 
has stimulated much interest and lively discus- 
sion. The use of personality measures in teach- 
er selection is on the increase. More general 
use of such measures is limited because both sat- 
isfactory rating scales and standards of teacher 
personality are lacking. Some institutions now 

using this factor in a screening device do so only 
to discover instability ina candidate. Experi- 
mental screenings of candidates using personal- 
ity devices are under study in a number of teach- 
er training institutions. 


B. General Features of Screening for Teacher 
Selection 


The screening devices described inthis sec- 
tion and their placement in a program of selec- 
tion represents a composite of practices as re- 
ported in the literature on teacher selection. 
There is much variation among teacher training 
institutions in this respect. 

Admission is the crucial point in most teach- 
er selection programs. This is true largely be- 
cause failure to predict the success of a candidate 
at the time of admission may lead to great waste. 
As a result, a large number of devices are inuse 
by teacher training institutions to screen for ad- 
mission. Screening devices may be divided into 
two catagories: application and orientation. 

Screening through application generally is 
based on information on the educational and home 
background. The most frequently used data are 
the high school record, scholastic attainment, 
personality and attitude ratings of the applicant 
by school staff members, standardized test 
Scores, and participation in extra-curricular ac- 
tivities. To complete the data on educational 
background the administrative head of the school 
from which the applicant is graduated is ordinar- 
ily requested to make some sort of a statement 
regarding the applicant's general acceptability 
for continued training. 

Sometimes the information on educational 
background is supplemented by a wide variety of 
information on home and personal background. 
Such questions as age, occupation, and educa - 
tional attainment of various members of the fam- 


ily are asked. The applicant is also sometimes 
requested to furnish an autobiography, a report 
of financial responsibility, a statement of pur- 
poses for continuing his education, and personal 
references regarding his acceptability. 

Admission is usually followed by a periodof 
intensive testing to aid in orientation. Some in- 
stitutions prefer to have the results of suchtest- 
ing in hand before the candidate’s application for 
admission is acted upon. While the latter ar- 
rangement has some obvious disadvantages, it 
does provide much additional data to assist in 
making the vital decision made at the time of ad- 
mission. 

Testing programs at the time of admission 
generally include such areas as English place- 
ment, general culture, intelligence, interests, 
personality, and reading. Within these areas 
there is wide variety of instruments used. 

The greatest variation in the selection and 
use of test instruments lies in the area of per- 
sonality where there appears to be little agree- 
ment on the qualities of a good teacher. Among 
the instruments frequently used are the Minne- 
sota Multiphasic Personality Inventory, the Bell 
Adjustment Inventory, and the Bernreuter Per- 
sonality Inventory. The use of a subjective tech- 
nique, the group interview, and projective tech- 
niques are being explored by several teacher 
training institutions. 

Two devices other than standardized tests 
are also commonly used in screening for teach- 
er selection; the physical examination ordinarily 
seeks to determine not only whether an individu- 
alis capable of undertaking a normal course of 
studies, but also if he has any physical defects 
which might limit his efficiency as a teac her 
and thus increase the element of risk involvedin 
accepting him as a candidate. 

Speech tests have а dual purpose. Theyare 
used to eliminate those applicants whose speech 
defects are such as to be a definite handicap in 
the profession. They also disclose remediable 
defects in applicants otherwise qualified. 

Applications for admission generally are re^ 
viewed by an admissions official responsible for 
weighing the quality of each applicant as a stu- 
dent and prospective teacher in the light of а11 
the evidence available. As screening procedures 
have become more refined a few teacher train- 
ing institutions have turned the important task of 
evaluating an applicant's qualifications over 0а 
committee. Such a move emphasizes the impor 
tance attached to the evaluation of an individual’s 
potential success as a teacher when application 
is made for admission. в 

A second important point іп the teacher 567 
lection programs of many teacher training insti 
tutions occurs at the end of the second year of 
preparation, or the beginning of the third year. _ 
Here it takes the form of “Admission to Profes 


(Vol. 24 


December, 1955) 


sional Study’’ or “Admission to Senior College’, 
ias may simply constitute a more intensive 
tio ge in evaluation of the candidate’s qualifica- 
la = in a continuous screening process. By and 
е по matter what the name or what proced- 
en are used, at this point the candidate's pro- 
де SS is examined as to his suitability as a teach- 
at enn most important new data commonly used 
ders 15 point is the record of a candidate's aca- 
ae achievement and the pattern of credits 
the es Academic achievement as measured by 
by m rade Point Average or its equivalentis used 
Of an e Schools to maintain minimum standards. 
ature evices for screening reported in the liter- 
Standa the maintenance of minimum scholastic 
ndards appears most frequently. 
for к insure uniform training deemed essential 
fice teachers many teacher training in- 
е specify an academic pattern to be fol- 
not by their candidates. While this device does 
кеш day specifically for screening, it again 
ы т requirements which the institution 
S necessary for teacher success. 
oe teacher training institutions employ 
a ο. interviews, a physical examination, and 
Sion t Ch test, at this point, to determine admis- 
s “ы teacher training program rather {һап at 
ations а, of admission to the school. A few insti- 
1015 use these devices both at the time of ad- 
lon and after the first two years of training. 
Evene d further evidence of a candidate's pro- 
ine the ¢ becoming a teacher some schools exam- 
s (he type of activities in which the candidate 
Moule ed beyond the requirements of the cur- 
in ime" Emphasis is placed upon participation 
teacher activities which appear to contribute to 
of к; Success following graduation. The use 
War ο device, as in the case of specific curric- 
ute gottirements, does not necessarily consti- 
date „сгеепіпр in that it selects the better candi- 
A nd eliminates the poorer. It does, how- 
Сады requirements which must be met, thus 
trainin” the candidate to further experiences and 
Qualis оп which he may draw later as а well- 
led teacher, 
is oti pattern of evaluation already discussed 
In Profe Combined with evaluation of achievement 
teach: 9Ssional training courses and prac tice 
tion anes Thus a continuous process of evalua~ 
trainj те the entire period of professio nal 
ne is provided. 
Scree 8eneral, it would appear that much of the 
"ling practices of the final two years of 
ПЕ are different from those that are com^ 
Sig,” employed at the time of original admis -~ 
large s Most teacher training schools do the | 
Siop. Share of screening at the time of admis 
School © promising candidates are admitted to 1 
jt od the poor risks are eliminated. There: 
Would appear that the function of screen 


STOELTING | des. 


ing changes. Ifa candidate continues to make 
normal progress in meeting the increasing re- 
quirements of training and skill, the screening 
process ‘‘selects’’ him for further training. On 
the other hand, a candidate is eliminated who 
does not have sufficient interest to fulfill the 
screening requirements satisfactorily or if a de- 
fect serious enough to impair successful teach- 
ing is discovered. 

The final stage of screening by teacher train- 
ing schools is graduation, certification, and 
placement. Most schools of education combine 
graduation and certification, i.e., the school 
certifies the individual as a teacher when he grad- 
uates. Some few schools drawa finer distinction 
between graduation and certification. These 
schools will graduate a candidate, provided he 
has fulfilled all the requirements; he may have 
achieved only the minimum professional stand- 
ards required by law, but because he hasnot at- 
tained the standards set by the school, the can- 
didate is not certified. To gain certification at 
such schools candidates must attain higher qual- 
ifications. The institution, in turn, certifies to 
the professional competence of the individual. 

Placement is not generally regarded as a 
part of the screening program. However, teach- 
er training institutions are se nsitive to its 
screening function. Screening controls are re- 
stricted or relaxed as opportunities for p lace- 
ment change. In addition, the use of screening 
devices indicates the desire by schools for teach- 
er education to reduce waste by eliminating those 
who might have difficulty in being placed. 


C. Survey of the Literature 


The literature on teacher selection may be 
divided into four areas: 


1. General Philosophy 


Teacher selection is based on the important 
principle thatgood schools depend on well quali- 
fied teachers. Such standards of competency in 
turn depend upon good training, a background of 
accepted research, and capable individuals, But 
good training can be had only as research find- 
ings in the field are put in practice. Onthis bas- 
is the literature recognizes that the better selec- 
tion of candidates is the key to higher profession- 
al standards. 

It is further pointed out in the literature that 
better teachers are needed for the increasingly 
complex job of teaching. As the center of learn- 
ing moves away from the highly organizedfields 
of subject matter toward the needs of the ind i- 
viduals, the task of guiding and directing the 
learning process places increasing demands up- 
on the teacher. In order to discharge such an 
important task adequately, more capable individ- 


118 n JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


uals are needed. 

The literature also points to the necessity 
for reducing the human and social waste involved 
in training and employing individuals who are not 
well suited to the task of teaching. The develop- 
ment and use of good selection procedures is a 
means of avoiding such waste, while atthesame 
time providing for more efficient use of the teach- 
er-training facilities. 

Comprehensive statements of basic philoso- 
phy regarding teacher selection are found in ar- 
ticles by Flowers (12); Kirkpatrick (16), andMor- 
ris and Phillipson (23). The latter article is par- 
ticularly valuable in that it describes the areas 
of research necessary to develop moreadequate 
teacher-selection procedures. 


2. General Surveys of Teacher Selection 


Procedures 


Studies on the prevailing practices inteach- 
er selection in limited areas of the UnitedStates 
have been reported by Haskew (13), and Stiles 
(30). Their conclusions may be summarized as 
follows: 


a. Most institutions have a teacher selection pro- 
gram. 

b. The programs range from simple require- 
ments to highly developed procedures in the 
process of being further refined. 

c. The element of timing in selection proced- 
ures varies from a single, “оп the spot” se- 
lection made only once to a continuous selec- 
tion process beginning while still in high 
School and extending to graduation and place- 
ment. 

d. Selection is the responsibility of one person 
in most institutions. Some few have a com- 
mittee, 

e. Most common bases of selection are: 

Scholastic achievement 
High school record 
Results of aptitude tests 
Results of interviews 

f. Personality traits are being studied and used 
increasingly in teacher selection programs, 

g. Some teacher training institutions without a 
selection program feel that as public institu- 
tions they do not have the right to exclude, 

h. Most of the institutions without a selection 
program do not have one because of an appar- 
ent lack of reliable bases to make a selection, 


3. Descriptions of Existing Programs ofSe- 


Tection 


Reports of specific programs of teacher se- 
lection are common in the literature. Four of 
these reports bear mention here for they repre- 
sent the most advanced practices in teacher se- 


lection. The selection program in the Connecti- 
cut Teacher’s Colleges is described by Engle- 
man and Larson (10); the plan in use in New Jer- 
sey is reviewed by West (22); the San Diego State 
College selection program is described in an ar- 
ticle by Alcorn (1); and the development and op- 
eration of the teacher selection program at Syra- 
cuse University in New York is given by Smith 
(28), and White (33). The significant feature of 
each of these teacher selection plans is that they 
utilize subjective techniques of personality anal- 
ysis in addition to other more common Sources 
of information to aid in making a judgment. 


4. Summary of Review of Literature 


Comprehensive reviews of the literature on 
teacher selection have been written by Archer (3), 
Barr (4), and Haskew (13). From these articles 
the following conclusions may be drawn: 


a. There is a lack of reliable objective data on 
which teacher selection may be based. 

b. The reviews report several studies to show 
significant correlations between success in 
teaching and scholarship. 

C. Increasing use is being made of tests of apti- 
tude for various special fields. 

d. Speech tests are becoming more common in 
programs of teacher selection. 

e. Greater emphasis is being placed on person- 
ality in teacher selection. Most of the teach- 
er training institutions using this factor e m~ 
ploy it to detect the unstable. Some few insti- 
tutions report the use of experimental tech- 
niques to select candidates who demonstrate 
desirable personality traits in a social situa- 
tion. 

1. Evidence of leadership qualities are becoming 
increasingly important as a part of tea c her- 
selection programs. А 

g. Tests of proficiency in basic skills are being 
used to supplement intelligence tests. Я 

h. Committee selection procedures аге steadily 
replacing selection by a single individual. 

i. There is an increasing recognition that teach~ 
er selection cannot depend оп a single factor, 
but must be based ona constellation of factors. 


SECTION II 
STATEMENT OF THE PROBLEM 
A. Selection and Teacher Success 


THE CENTRAL problem of this study is to 
determine the efficiency with which the several 
Selective devices employed at the University of 
Wisconsin operate in choosing potentially suc 7 
cessful teachers out of the total group seeking 
admission and eventual certification for teach" 


December, 1955) 


Ju To do this the study will seek to answer 
Ve questions: 
1 How well do present selection procedures dis- 
τ μας between the superior teacher candi- 
late and the teacher candidate who is likely to 
2 meet with only limited success? 
* Under what circumstances do selection de- 
vices now employed permit admission of indi- 
3. үле not likely to succeed? 
e basis for raising or lowering the stand- 
de by which candidates are admitted to pre- 
PT deem training and certification as teachers? 
18 the t point in the teacher educationprogram 
Pads Screening for teacher education likely 
5. е most effective? 
De recommendations, based on the findings 
E Study, can be made for improved pro- 
fe res for the selection of candidates for 
Ching? 


"M Study the effectiveness of the screening 
ο. used at the University of Wisconsin, the 
zA or ich selection of the 1952 gradua ting 
? relat the School of Education was based will 
class E to success of the individuals of the 
owing graduation. These data include: 


devi 


Rank in high school class 

Sychological scores 

Henmon-Nelson 
©. Соо American Council on Education 
d P hien ative Reading test score 
e ге арув General Culture test score 
aes icted Grade Point average 
δ. free Grade Point average 
ent Multiphasic Personality Inventory 

h, Score 
Peech proficiency test score 


rite ese data will be correlated to the various 
Criteris for measuring success in teaching. The 
1а will consist of: 


a 
NE 
intg Service rating by the principal or super- 
t of those who were employed in a 
tation, situation during the year since grad- 
of departmental rafing based on the estimate 
the fy ndidate 6 effectiveness as a teacher by 
< aculty of his major department. 
didat cement Bureau rating based on the can- 
q. е? general acceptability as a teacher. 


Pract; 
actice-teaching grades. 


he 
alte, | Tatings will first be considered separately 
Yatin f ich they will be combined into а 51 ngle 
Or each individual included in the study. 
Prosegue casure of the efficiency of the selection 
Š obi i SS used by the School of Education will 
ined through correlating the scores used 


STOELTING 


119 


in screening with the criteria of teaching suc- 
cess. By this means it will be possible to de - 
termine how well the screening devices can dis- 
criminate between teachers of superior, aver- 
age, and inferior teaching ability. The informa- 
tion gained through this study should offer a bas- 
is for improving the screening procedures inthe 
School of Education of the University of Wiscon- 
sin, and also provide a means for continuous 


evaluation of the program. 


B. Selection of Candidates for Teacher Training 
at the University of Wisconsin 


The selection of candidates for teacher 
training at the University of Wisconsin has a du- 
al purpose: (1) to assure that all individuals who 
are accepted for training as teachers will suc- 
ceed, and (2) to assure that a larger proportion 
of those accepted for training are capable of be- 
coming superior teachers. Thus Screening 
seeks to protect individuals from enteringa field 
of work in which they may not succeed, while at 
the same time protecting our schools by supply- 
ing better teachers. 

A major point in the screening of candidates 
for teacher training at the University of Wiscon- 
sin occurs at the time of admission. The data 
on which admission is based includes personal 
data (physical characteristics, appearance, in- 
terests, ambitions), family data (nationality, 
parent's occupation, residence, siblings), edu- 
cational background (academic record, test rec- 
ord, pattern of credits earned, personality rat- 
ing, extra-curricular activities) and a statement 
by the administrator of the preparatory school 
regarding the educational promise of the individ- 
ual. 
The data is evaluated by an official in the 
Admissions Office at the University of Wiscon- 
sin. Greatest emphasis in the evaluation is 
placed upon the future academic promise of the 
applicant. Upon admission each student is а5- 
signed to the school of his choice and an advisor 
in his major field. A student who expresses a 
preference for entering the School of Education 
is enrolled as ‘‘Pre Ed", and assigned to an ad- 
visor in the College of Letters and Science. 

During the week of registration new students 
participate in a program of orientation to life at 
the University. An important feature of this pro- 
gram is the extensive testing done during the per- 
iod. The tests included in the program are the 
Cooperative Reading Test, the Cooperative Gen- 
eral Culture Test, and the American Council on 
Education Psychological Examination. The re- 
sults derived from these tests are used to ad- 
vise and counsel the student during his first two 
years at the University. These test results al- 
so have an important function in the screening 
of teacher candidates at the time of admission 


120 JOURNAL OF EXPERIMENTAL EDUCATION 


to professional study. 

Following admission to the University, there 
is no direct screening of teacher candidates until 
the student applies for transfer to the School of 
Education at the end of the fourth semester of 
study. Two basic requirements must be metdur- 
ing the first four semesters to be admitted to pro- 
fessional study in the School of Education: (1) a 
student must have earned at least 62 credits of 
an approved course of study with a minimum 1.3 
grade point average; and (2) the course of study 
a student presents for evaluation at the end of 
four semesters' work must meet the standardre- 
quirements for majors and minors, specific 
course requirements, and requirements varying 
according to the major and minor departments. 

At the end of the fourth semester of study 
(or when 62 credits in an approved pattern have 
been earned) the student may apply for transfer 
to the School of Education for professional train- 
ing. Evaluation of a student's record up to that 
point constitutes a second major point in the 
Screening process. Data on which the screening 
is based includes a transcript of credits earned, 
grade point average, high school rank, and the 
results from the orientation tests taken during 
the registration period at the beginning of the 
first semester at the University. 

The most important factors in the Screening 
are the two basic requirements for admission to 
the School of Education—completion of course 
requirements and maintenance of a 1.3 grade 
point average. 

Course requirements which must be com - 
pleted before an applicant may be admitted to pro- 
fessional study include: 


a. English attainment requirements 

b. Physical Education or Military Science 

c. Minimum requirements in majors and 
minors 

d. A minimum of 62 credits 


In addition each major department has vary- 
ing requirements which the individual must meet, 

In some cases when most requirements have 
been met and the candidate presents records 
otherwise suitable, he may be admitted on the 
condition that certain deficiencies willbe removed 
during the following semester. In other cases 
where many requirements remain to be complet- 
ed, the candidate must utilize an additional sem- 
ester or summer session before application for 
transfer may be made. 

The bo fe of credits earned is also evaluat~ 
ed for grade point average (basedon 1 о 
per credit for a final course grade of кл 
grade points per credit for a final "raa а tc 
of “В”, and 3 grade points per credit for a | 7 
grade of “А”), A minimum total grade qe 
average of 1. 3 is specified for admission to the 


School of Education. 

Candidates whose application for admission 
to teacher training is rejected on the basis of a 
grade point average too low to meet the minimum 
requirement may request to have his case re- 
viewed by the Dean of the School of Education, or 
anassistant. In sucha case, compensating fac- 
tors such as an above-average I.Q., or above 
average high school rank, are sought in the can- 
didate's records. Such candidates whose rec- 
ords are otherwise satisfactory may be admitted 
on à strict probationary basis. 

While data from the Cooperative GeneralCul- 
ture and Cooperative Reading tests are used as a 
part of the screening process, it does not play a 
part in a candidate's admission to the School of 
Education. These data are used to aid the indi- 
vidual candidate and his advisor in plotting the 
most appropriate.course of professional study 
based on his skills and interests. 

Following admission to professional study 
the student remains subject to course require- 
ments while maintaining the 1.3 grade point av- 
erage. Both of these devices continue to serve 
the screening function in that they eliminate those 
who cannot reach the minimum standards of suc- 
cess in teacher training. 

During training the candidates must meet 
three other screening situations to qualify for 
graduation and certification as a teacher. The 
first of these is a speech test which is adminis- 
tered jointly by the School of Education and the 
Department of Speech. Its purpose is to certify 
that the speech proficiency of the teacher candi- 
date is of a satisfactory standard for classroom 
Work. Provision is made for remedial work for 
those who cannot qualify on the initial test. Occa- 
Sionally this device may screen out such individ- 
uals whose speech handicaps are such as to limit 
their efficiency in the classroom. 

The Minnesota Multiphasic Personality In- 
ventory serves as a second screening device dur~ 
ing the period of professional training of teach- 
ers. Use of the inventory is limited to the detec- 
tion of such individuals whose personality is un- 
Stable to the point of limiting their effectiveness 
in the classroom. Such individuals are referred 
to the Student Health Clinic for treatment and are 
counseled into other fields of work. A 

Finally, a candidate must present a certifi- 
cate of physical health and fitness from the Uni- 
versity Medical Examiner as an indication that 
no physical defects exist to limit the individual’s 
Success as a teacher. 

When the candidate has successfully met each 
of these screenings the School of Education 15 
willing to certify his success asa teacher by 
granting the University Teacher’s Certificate. 
Through the use of the screening devices as de- 
scribed only those whose success as teachers is 
reasonably assured are retained for trainingand 


(Vol. 24 


December, 1955) 


Sraduated with certification. 


SECTION III 
GATHERING THE DATA 
A. The Study Group 


Tes A BASIS for study of the selection pro- 

the des employed әу the School of Education of 

co of Wisconsin, the 1952 graduating 

τας а (February, June and August) were chosen 

teachi y. Members of these classes have been 

quat ng one or more years, thus giving an ade- 

е basis for in-service success rating. 

cig? combined membership of these three 

en. jy! is 352; 134 were men and 218 were wom- 

Octone preliminary survey of the group made in 

Wome r, 1953, disclosed that 54 men and 133 

year ἐς or a total of 187, taught during the first 

et ollowing graduation; a total of 165 did not 

men. Ak 80 were men and 85 women. Of the 80 

vices ү did not teach, 30 were in military ѕег- 

Шат the 85 women who did not teach 27 were 

ed. 

lis The remaining non-teaching graduates may 
8Touped as follows: 

1. Attended graduate school—19 men, 9 
women 

Fi Decided not to (еасп--12 women 

- No record of employment, and no reply 
4 to two inquiries—i3 men, 9 women 
- Entered private industry —11 men, 12 

women 

3. Other public employment—4 men, 4 
Women 

6. Unplaced—3 men, 9 women 


it is While this non-teaching group appears large, 
lengo sible that many may eventually become 
Renee Some of those now in service and 
ter the in the Graduate School will doubtless en- 
ing Eum profession later. Nevertheless consider- 
арре, € totals involved, the non-teaching gr oup 
ars large, 
further study of the 1952 graduating gro up 
Screen, apparent that much of the data used for 
erreg qe Was not available for those who trans- 
Year ον, the University of Wisconsin after à 
Proces. two of study elsewhere. These were not 
Nor y Sed by the usual admission procedures, 
Sram еге the data of the orientation testing pro^ 
Orig; able, Therefore, only those who had 
and wh, ly entered the University as freshmen 
miss; had gone through the entire procedure of 
зна, оп and screening were included in the 
Wisco Thus, 163 transfers to the University of 
of 4. OnSin were dropped from the study for lack 
» leaving 189 in the group to be studied. 


STOELTING 121 


The placement records for this group pro- 
vide the data presented in Tables I, II, and ΤΠ. 


B. Methods Employed in Gathering the Data 


To facilitate gathering of the data, a special 
4'' x 6" card was devised and printed for use in 
the study. One card was prepared for each indi- 
vidual. On the top line of the card beginning with 
the left margin the name of the individual was 
typed. The space. immediately below the name 
was reserved for the date of entry into the Uni- 
versity and a notation whether the individual was 
an original entry or a transfer student. The up- 
per right hand corner was used to recordthe date 
of graduation and the individual's асадепзіс ma- 
jor and minors. The space below Criterion Rank 
Was used to record the details of the individual's 
placement. 

Since all test data were filed according to 
date of entry, the transcript of each student's 
record was the logical starting point. Tran- 
scripts of the graduates were made available by 
the School of Education Dean's office. Inaddition 
to the date of entry the transcript also contained 
high school rank and earned grade point average 
data. Since rank in high school class hadalready 
been converted to a percentile score, these data 
were simply transferred to the record card. To 
compute the earned grade point average it was 
necessary to count the number of credits and 
grade points earned and to record them in frac- 
tion form to be calculated later. 

With the date of entry available, the gather- 
ing of test data could go ahead since this data 
was filed according to the student's entry date. 


The test data included: 


i. Henmon-Nelson Psychological 

2. American Council on Education Psycho- 
logical 

3. Cooperative Reading 

4. Cooperative General Culture 


The information on these tests was made avail- 
able through the Student Counse ling Center. 
Since the data for each of the tests were already 
in percentile rank form, the data were transfer- 
red directly to the individual recor а foreach 
graduate in the study group. The Student Coun- 
seling Center also furnished data on each individ- 
ual's predicted grade point average (based on a 
regression equation using high school rank and 
percentile rank from the American Council on 
Education Psychological examination to predict 
Grade Point Average). 

Inasmuch as only the raw scores were avail- 
able for the Minnesota Multiphasic Personality 
Inventory, it was necessary to complete a pro- 
file and code for each member of the study group 
before the individual's score could be recorded. 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


SUMMARY OF PLACEMENT OF THE 1952 STUDY GROUP OF THE 
SCHOOL OF EDUCATION AT THE 
UNIVERSITY OF WISCONSIN 
(Survey of October, 1952) 


—rt ((O—— 


Men employed in teaching positions 21 
Women employed in teaching positions 11 

98 
Men employed in non-teaching positions 42 
Women employed in non-teaching positions 49 

91 
Group Total Р 189 


ТАВІЕ П 


SUMMARY OF PLACEMENT OF THE 1952 STUDY GROUP OF THE 
SCHOOL OF EDUCATION AT THE 
UNIVERSITY OF WISCONSIN 
IN TEACHING POSITIONS 9 
(Survey of October, 1952) 


Teaching Field Men Women Total 
SS ee) 
Agriculture 2 2 
Art Education 2 5 7 
Business Education 1 1 
Chemistry 1 1 2 
Economics 1 1 
English 1 10 11 
French 2 2 
Geography 1 1 
History 3 1 4 
Home Economics 26 26 
Mathematics i 1 
Music 9 9 
Natural Science 2 1 3 
Physical Education 1 5 12 
Recreation 7 7 
Sociology 1 1 
Speech 1 1 
Speech Correction 1 6 7 
SS Í s μι 
Total Men Teaching 21 

Total Women Teaching 77 

Total Graduates Teaching 98 


(Vol. 24 


December, 1955) STOELTING 


TABLE III 


SUMMARY OF PLACEMENT OF THE 1952 STUDY GROUP OF THE 
SCHOOL OF EDUCATION AT THE 
UNIVERSITY OF WISCONSIN 
IN POSITIONS OTHER THAN TEACHING 
(Survey of October, 1952) 


Men Women Total 

Decided Not to Teach 1 6 1, 
Graduate School (U of W) 5 2 

(U of Chicago) 1 8 
Married 17 17 
Military Service 21 1 22 
No Reply T 7 14 
Other Public Employment 1 4 5 
Private Industry 6 5 11 
Unplaced τ 7 
"EE EE——————— M—M MM 
Total Men Not Teaching 42 
Total Women Not Teaching 49 ái 


Total Graduates Not Teaching 
rn ee a u M 


TABLE IV 


CORRELATIONS OF FOUR CRITERIA OF TEACHING SUCCESS WITH TEST DATA 


EMPLOYED IN SCREENING CANDIDATES FOR TEACHER TRAINING 


Criteria of Teaching Success 


Placement Practice 

In-Service Departmental Bureau Teaching 
Screening Data Rating Rating Rating Grades 
penes p a a 9 
ACE Psychological -. 027 . 163 . 073 .026 
Reading 056 . 240 -106 . 059 
Ὃ ας ο -. 169 .105 -.032 -.060 
History -.245 «101 ‚054 -.038 
Literature -. 549 .087 -.112 -.039 
ӛсіепсе -.176 .045 . 000 -.061 
-.042 -.161 -.133 -.194 


Fine Arts 


123 


ОУ 


124 JOURNAL OF EXPERIMENTAL EDUCATION 


The data on the speech screening test were 
made available in the office of Professor Gladys 
Borchers, Chairman of the Education-Speech 
Committee. Ratings in their originalform were: 
A - superior, B - above average, and C - aver- 
age (no student is certified with less than a “C” 
rating, and is assigned remedialwork untila “C” 
rating is earned). To give these ratings numer- 
ical basis for statistical purposes, an “А” was 
recorded as “5”, “В” as “4”, and “C” ас “3”, 

With the exception of the daia from the Min- 
nesota Multiphasic Personality Inventory, allthe 
data was ina form readily adaptable to use ina 
correlational study. The MMPI data were not 
amenable to such a study. 

During the time screening data was being 
gathered, the data on criteria of teaching success 
to which screening data will be related was also 
being recorded, The criteria of teaching suc- 
cess include: 


i. An in-service rating 

2. A departmental rating 

3. A Placement Bureau rating 
4. Practice teaching grades 


To obtain the in-service rating for the 98 
graduates of the study group who were employed 
as teachers during the first year following grad- 
uation, a postal reply rating card was devised 
and printed (see Appendix F).* The rating is 
based on the individual’s performance inhis first 
year in teaching. 

These cards were mailed to the superintend- 
ents or principals of 95 teachers in the group. 
Ratings for three of the teaching group who were 
employed as teachers of recreation by the Amer- 
ican Red Cross were not requested because of 
much shifting of assignments, and no current in- 
formation on what their present situation was 2 
furthermore these individuals were not assigned 
in one location long enough to give an accurate 
in-service rating. Wherever possible, the re- 
quest for a rating was sent directly to the super- 
intendent or principal in charge. 

Within 14 days of the mailing date, 76 (80%) 
had been returned. At the end of 30 days, 88 
(93%) had been returned. Two of the remaining 
seven for whom no rating was returned had not 
been placed as the survey of placement had indi- 
cated. The remaining five were placed out of the 
State of Wisconsin, and, lacking the name of their 
principal or superintendent, no further effortwas 
made to obtain an in-service rating. _ 

The in-service ratings obtained through this 
means were recorded as “5” for a superior rat- 
ing, “4” for an above-average rating, “3” for 
an average rating, “2” for a below-average rat- 


ing, and “1” for an inferior rating. By giving 

these ratings a numerical value it was possible 

to make various statistical analyses of them. 

A second criterion of teaching success con- 
sists of a departmental rating. This rating is 
made by the faculty of each individual's major 
department. The department's rating is an esti- 
mate of the individual's potentialities as a teach- 
er. Since x major department's most important 
contact with the individual is through his cla ss~ 
work, the rating may reflect heavily the individ- 
ual's academic achievement in his major sub- 
jects. 

The departmental ratings for the 1952 grad- 
uating classes were not uniform in the type of 
ratings employed. To produce as much uniform- 
ity between the departmental ratings as possible 
each department prepared a key for translating 
the scores into superior, above average, aver- 
age, below average, inferior ratings. These, as 
with the in-service ratings, were recorded as 
numerical quantities (‘‘5’’ for a superior rating, 
**4" for an above average rating, etc.,) for di- 
rect use in the computations. 

A Placement Bureau rating was the third 
criterion of teaching success. This rating, made 
by the Assistant Director of the Placement Bur- 
eau, depends ona group of factors not likely to 
appear in the other criteria ratings. The follow- 
ing factors were said to be involved in arriving 
аға rating: " 
i. Credentials— statements ot observing officials, 

advisors, teachers, and supervisors regard- 
ing the individual's promise. Other informa- 
tion used here includes statements by the can- 
didate himself regarding his interests, pref- 

erences, and ambitions. 

2. Observations—appearance, attitudes and gen- 
eral adjustment of the individual is observed 
ina personal conference, in connection wi th 
his routine duties, and in social situations. 

3. Reviews of practice teaching performance by 
the critic teachers. 

4. Transcript is consulted for placement pur- 
poses only; it is not used for rating purposes. 
Grade point average is used'for rating pur ^ 
poses only when very high or very low. 

5. The departmental rating is considered only 
when very high or very low. 


Ratings were provided on the superior, 
above average, average, below average, infer- 
lor scale. The ratings were recorded on the 
Same numerical basis as the other ratings. 

Only 141 ratings could be provided by the 
Placement Bureau since 48 in the study group 
did not register with the Bureau. No special at- 


ж AlL references to Appendices may be found in original thesis filed in the Library, University of Wis- 


consin, Madison, Wisconsin. 


(Vol. 24 


December, 1955) 


tempt was made to determine why these 48 did 
not register, but a simple survey of the records 
disclosed that a large proportion were those who 
decided not to teach, those women who were 
pi and decided not to seek placement, anc 
he graduates who majored in Recre ation and 
Were placed through other placement facilities. 
- A final criterion of teaching success con- 
ists of practice teaching grades. These grades 
are based on each individual's attainment in two 
practice teaching situations —one semester of 
о teaching in а minor academic field and 
dono. ta Д of practice teaching in the major 
Se етіс field. These grades do not appear 
парна on Ше transcript but are available sep- 
With y in the Student Teaching records office. 
in Separate major and minor practice teach- 
μα aan available fór each individual, the 
ti es Were averaged to produce a single prac- 
Ce teaching grade. 
ау roga teaching grades are genera lly 
цени ed on a superior, average, inferior basis 
ἘΞ Бап A, В, C grading system. It was neces- 
я ty therefore, to assign the numerical values 
ese grades as follows: 


A (superior) 

А-, B+ (above average) 
B (average) 

B-, C+ (below average) 
C or below (inferior) 


γα № бо wo 


uh these numerical values the ratings will be 
other ο the computations in the same way as the 
Criteria, 
T single, over-all criterion of teaching suc- 
οκ, derived from an average of the four 
en шн described above. No weighting was giv- 
аз са Separate criteria: (1) since one individual 
that hee for each of the ratings, itis felt 
emppa s, едеп of any individual should not be 
Straj d more than the others; (2) with a 
factor it average to produce the criterion, noone 
felt n teaching success is emphasized. It is 
n t all the factors involved in arriving at 
ода Ча ratings are contributory to teaching 
S, and should be considered equally. 
Stug, © Computing the criterion, 80 of the total 
n аң гор had all four of the criteria available. 
teria itional 66 of the total group had three cri- 
the т. AVailable to formulate their criterion. For 
teri, maining 43 of the total group only two cri” 
teach; porated in arriving at their criter ion of 
ing success, 
Suc BE Avoid a marginal criterion of teaching 
есес S ratings for seven individuals, it was 
Wher, Sary to give emphasis to a single rating. 
the Sver these occur emphasis was given in 


able. I Ction of the in-service rating, if avail- 
Sep) (0 the Placement Bureau rating if the in- 


ic 5 А 
° rating was missing; or to the practice 


STOELTING 125 


teaching grades if both the in-service ratings and 
the Placement Bureau ratings were notavailable. 

These criteria of teaching success were 
considered separately in correlationstudies with 
the screening data, and then, combined into a 
criterion of teaching success, were correlated 
again with the screening data. 


SECTION IV 
ANALYSIS OF THE DATA 
A. The Criteria of Teaching Success 


TO GET further data relative to the criter- 
ia of teaching success, intercorrelations were 
calculated among them. 

The correlation between the in-service rat- 
ings and the departmental ratings, based on 88 
cases for whom in-service ratings were avail- 
able, was .319. 

It is entirely probable that these ratings have 
only academic ability as a common element. The 
department faculties were decidedly limited in 
the aspects of teaching upon which their esti- 
mates could be based. Academic ability was 
the one aspect with which these individuals were 
mostíamiliar. The in-service ratings depend- 
ed upon this and other qualities as well. 

While there is little relationship between 
these data it is felt that both areas covered by 
the ratings are of importance in the training of 
a teacher as wellas in success in teaching. 

The highest correlation between any twocri- 
teria of teaching success was .627 for 84 cases 
based on the in-service and Placement Bureau 
ratings. A strong similarity in what the ratings 
attempt to measure doubtless accounts for the 
relatively high correlation. In both ratings the 
academic record is consulted, but not empha- 
sized. Furthermore, in both the ratings, per- 
sonality becomes a matter of considerable іт ~ 
portance. Such matters as adjustment in social 
situations, interest in people, attitudes toward 
community responsibilities, and general person- 
al appearance are important factors in both the 
in-service and the Placement Bureau ratings. 

The relationship between the in-service rat- 
ings and practice teaching grades, based on 88 
cases, was .327, which is low. It is interesting 
that there should be so little in common in the 
two ratings. Itis possible that a teacher's abil- 
ity to organize and discharge а set of duties com- 
prising an actual teaching position is different 
from that provided by practice teaching. Itwould 
seem on the basis of the low correlation here 
derived that a study Should be made of the fac - 
tors producing such a result, 

The correlation involving 141 cases between 
the departmental ratings and Placement Bureau 


126 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


ratings was .472. 

While both ratings make use of academic 
achievement as part of the rating, the Place- 
ment Bureau would appear to have recognized 
the importance of the individual’s personality in 
teaching. 

Further differences in the ratings empha- 
size the supplementary character of the ratings. 
An individual’s ability to adapt himself toa spec- 
ific job, to a school organization, and to a com- 
munity is of much importance in the Placement 
Bureau rating, while the departmental rating is 
not so much concerned with this factor. A third 
important difference in the ratings concerns the 
type of performance each is concerned with; 
the Placement Bureau rating is alert toperform- 
ance in leadership and organization of social 
services while the departmental rating depends 
largely on academic performance. 

A correlation of .551 involving 189 cases 
indicates considerable similarity between the 
departmental rating and practice teaching grades. 
It appears likely that both of these ratings de- 
pend heavily on academic achievement. 

It appears that the relatively high correla- 
tion of departmental-practice teaching ratings 
may offer some clue to the inability of practice 
teaching grades to predict in-service success. 
Greater similarities occur between practice 
teaching grades and departmental ratings than 
practice teaching grades and in-service ratings. 
It is likely, then, that greater emphasis in prac- 
tice teaching grades is being placed on the aca- 
demic aspects of teacher preparation rather than 
leadership and organizational factors consider- 
ed necessary to succeed on the job. 

The Placement Bureau rating serves as а 
transitional rating between training for teaching 
and actual teaching in the field. This ratingcor- 
relates well with measures of in-service suc- 
cess, emphasizing the practical aspects of teach- 
ing, and also correlates well with measures of 
teacher success taken during preparation for 
teaching, emphasizing the theoretical and aca- 
demic aspects of teaching. A measure of teach- 
ing success involving the factors used in Ше Place- 
ment Bureau rating warrants further investiga- 
tion for possible adaptation to pre-training se- 
lection purposes, 

The value of the departmental ratings and 
practice teaching grades seems to be low for 
purposes of predicting in-service success, e 
These ratings probably emphasize the academic 
and theoretical factors in teacher training as a 
basis for measuring teacher success. Asare- 
sult of the emphasis, however, they T eflect 
teacher success in training, and justify their 

as criteria of success. ñ 

Since all of the elements used to arrive at 
the criteria ratings are of importance in some 
phase of teacher preparation and teaching, and 


use 


teacher selection needs be concerned with all 
these aspects, a straight average of the criter- 
ia is used to produce a composite criterion of 
teacher success. It is assumed that this criter- 
ion of teacher success will reflect all these im- 
portant elements in weighing the ability of a 
screening device to discriminate between levels 
of teaching ability. 

In an effort to get further data on the inter- 
relationships of the criteria of teacher success 
a multiple R was calculated using the depart- 
mental rating, Placement Bureau rating, and 
practice teaching grades to predict in-service 
Success. The R was .629. Comparison of this 
figure with the intercorrelation figures on the 
teaching success criteria will show thatthe three 
ratings used together to predict in-service suc- 
cess are no better than the rating used by the 
Placement Bureau alone. It further indicates 
that what is being measured in the departmental 
ratings and practice teaching grades has no par- 
ticularly significant relationship to in- service 
teaching success. 


B. Correlations of Screening Data With the 
Criteria 


The data used for screening may conven- 
iently be divided into three groups; standardized 
test data, academic achievement data, and 
speech proficiency test data. Table IV shows 
the correlations of the standardized test data 
with the four criteria of teaching success. Table 
V shows the correlations of the same test data 
with a criterion of teaching success. 

Examination of the correlations on Table IV 
and V, the relationship between standardized 
test data and the criteria of teaching success, 
discloses that only two correlations in both tables 
are very different from zero. The firstofthese, 
namely, the correlation between the Henmon- 
Nelson psychological scores and in-service suc- 
cess, is quite low. The other correlation, the 
7.549 between literature scores and in-service 
Success ratings, would not generally be accept- 
able as evidence of teacher acceptability. 

It must not be assumed that the evidence 
given in Tables IV and V is proof that the areas 
covered by the tests are not importantinsuccess- 
fulteaching. Even the least well prepared may 
appear adequate to rating officials, —thus little 
or no correlation. Then, too, these particular 
instruments possibly cannot be relied upon to 
Screen teacher candidates. Other instruments 
in the same areas may be able to perform the 
Screening function adequately where the instru- 
ments under study here have failed. 

Reference to Table VI will show that data 
on academic achievement is promising for use 
in screening. While the correlations are not 
high, the data can be used with a reasonable de- 


- 
2 


December, 1955 STOELTING > 127 


TABLE У 


CORRELATIONS ОҒ А CRITERION OF TEACHING SUCCESS 
WITH TEST DATA EMPLOYED IN SCREENING CANDI- 
DATES FOR TEACHER TRAINING 


a————. ο -υ----- HO 
Criterion of 


Screening Data Teaching Success 
Henmon-Nelson ) 
Psychological .139 
ACE Psychological .103 
Reading .172 
General Cooperative Culture 
Social Problems -.054 
History -.005 
Literature -.035 
Science . 003 
Fine Arts -.206 


_————— 


TABLE VI 


CORRELATIONS OF FOUR CRITERIA OF TEACHING SUCCESS WITH 
ACADEMIC ACHIEVEMENT DATA EMPLOYED IN SCREENING 
CANDIDATES FOR TEACHER TRAINING 
ыыы r 
Placement Practice 
In-Service Departmental Bureau Teaching 
Screening Data Rating Rating Rating Grades 


High School Rank «221 «205 199 .237 


Predicted Grade Point 


Average* . 047 . 309 «166 . 115 
Earned Grade Point 
Average (4 semesters)* .385 .335 .302 375 


*Correlation between Predicted and Earned Grade Point Average = .570. 


128 JOURNAL OF EXPERIMENTAL EDUCATION 


gree of validity. 

It will be noted that the correlations of High 
School Rank with the various criteria of teach- 
ing success are low; when High School Rank is 
correlated with the composite criterion of teach- 
ing success, the correlation becomes worthy of 
consideration. It is doubtful, however, if ther 
.270 is high enough to justify the use of High 
School Rank as a screening instrument. Possi- 
bly its use may be justified if the severe limita- 
tions imposed by the low validity are observed. 

Predicted Grade Point Average does not ap- 
pear to qualify for use as a screening device. 
Only the correlation with the departmental rat- 
ing is considerable. Its correlationwith Earned 
Grade Point Average is .570, which is always 
a consideration in the training of teachers. 

The Earned Grade Point Average has cor- 
relations with the criteria which are somewhat 
higher, more consistent and probably more use- 
ful as a screening instrument. 

Correlation with the criterion is higher 
than with each of the criteria separately, indi- 
cating that the Earned Grade Point Average can 
predict moderately well over a wide range of 

measures of teaching success. 

Since Earned Grade Point Average is the 
one screening device capable of discriminating 
between levels of teaching ability, its use might 
possibly be broadened to include other devices. 
Earned Grade Point Average is now used as a 
basis for admission to professionalstudy in 
the School of Education, a 1.3 grade point aver- 
age being the minimum. This might wellbe car- 
ried on through the final two years of prepara- 
tion. A separate requirement might be set up 
to apply to professionalcourses. A minimum 
required 1.8 grade point average, for example, 
could be used to screen teacher candidates for 
higher professional standards, while a 1.3 grade 
point minimum could remain for all other courses. 
Such adaptations could broaden the use of the 
only valid screening included in this study. 

Reference to Table VII above will show 
that the Speech Proficiency test is not capable 
of predicting teacher success, there being a 
near zero relationship between speech scores 
and success in teaching. This certainly 
does not mean, however, that the speech testno 
longer serves an important function. The low 
correlations probably arises out of the fact that 
extreme cases have been removed from the 
teacher preparation program or that the defic- 
iency has been overcome. Those that meet this 
standard seem adequate, 

The use of the speech proficiency test as a 
screening device probably should be continued 
to insure minimum speech proficiency. As such, 
it will not be necessary to rate the individual, 
but merely to certify that he meets minimum 
standards, or to withhold certification until he 


becomes qualified through remedial work. 

In order to further describe the efficiency 
with which the various selective devices operate 
the correlations between the selective devices 
and the various criteria were converted into an 
efficiency score through the use of the Predic- 
tive Efficiency formula. In this formula, Pre. 
Eff. = 1 - i-r? where / 1-r? is the Coefficient 
of Alienation. The coefficient gives a basis for 
decinding how high a correlation must be in ord- 
er to be satisfactory for predictive purposes(24: 
115). This subtracted from i gives a decimal 
fraction which can be treated as a percentage. 

A predictive efficiency percentage above 90 is 

regarded as high, between 10 and 90 as moder- 
ate, between 5 and 10 as low, and below 5 as 

negligible. 

The only correlations whose predictive eí- 
ficiency was better than 5% were between the 
earned grade point average and the criteria of 
teaching success, The correlation between 
earned grade point average and the criterion of 
teaching success yielded a 9% predictive effic- 
iency; between earned grade point average and 
the in-service rating, 8% predictive efficiency; 
between earned grade point average and the de- 
partmental rating, 6% predictive efficiency; and 
between earned grade point average and practice 
teaching grades, 7% predictive efficiency. A 11 
other predictive efficiency scores were less than 
5%, thus not reliable for predictive purposes. 


C. The Minnesota Multiphasic Personality In- 
ventory and Teaching Success 


The Minnesota Multiphasic Personality In- 
ventory was included as a part of the screening 
program by the School of Education in order to 
detect individuals with personalities suchas to 
limit their effectiveness in the classroom. The 
data used in this study concerns only those whose 
code score met the standards considered to be 
adequate for teaching. Accordingly, with their 
elimination those remaining should be adequate 
as shown by subsequent results. 

The data, when classified according to the 
categories of teaching success (namely, Super^ 
ior, Above Average, Average, Below Average, 
and Inferior, based on the criterion of teaching 
success), yielded no discernable personality pat- 
terns. Personality codes which appearedamong 
teachers judged to be inferior were found in 
equal or greater proportion among the other cake 
egories of teaching success. Furthermore, per 
sonality codes indicating a mild maladjustment 
appeared as frequently among the average, 
above average, and superior teachers as was 
the case among the below average or inferior. 

Further investigation in the use of this in^ 
strument is possible, but beyond the scope of 
the present investigation. The responses on the 


(Vol. 24 


| 


December, 1955) 


STOELTING ` 


TABLE VII 


CORRELATIONS OF A CRITERION OF TEACHING SUCCESS 
WITH ACADEMIC ACHIEVEMENT DATA EMPLOYED IN 
SCREENING CANDIDATES FOR TEACHER TRAINING 


ee 
Criterion of 


Screening Data Teaching Success 
„сс ———-—— —= 
High School Rank «210 
Predicted Grade Point Average «201 


Earned Grade Point Average 
(4 Semesters) . 407 


Ш 


TABLE VIII 


CORRELATIONS OF A SPEECH PROFICIENCY TEST 
USED IN SCREENING CANDIDATES FOR TEACH- 
ER TRAINING WITH FOUR CRITERIA AND A 
CRITERION OF TEACHING SUCCESS 


Criterion of 


Screening Data Teaching Success 


Speech „119 
Speech—In-service Rating .011 
Speech— Departmental Rating .253 
Speech—Placement Bureau Rating . 069 
Speech— Practice Teaching Grades .221 
Speech— Composite Criterion of 

179 


Teaching Success 
MENU NM Loc 


129 


130 Р JOURNAL ОЕ EXPERIMENTAL EDUCATION 


test may be analyzed item by item to determine 
what items, if any, are able to discriminate be- 
tween different levels of teacher success. Thus, 
while the test as a whole is not a valid screening 
instrument, separate items within the test may 
be found entirely valid for use in screening. The 
data collected on the candidates for teacher train- 
ing through the Minnesota Multiphasic Personal- 
ity Inventory indicates that this test is incapable 
gi predicting teacher success, 


D. Conclusions of the Study 


On the basis of the data described in the 
foregoing pages, answers are proposed to the 
basic questions involved in this study: 


1. How well do present selection pro- 
cedures discriminate between the 
superior teacher and the teacher 
likely to meet with only limitedsuc- 
cess? 


On the basis of present selection procedures 
none of the standardized tests used appear cap- 
able of predicting future teacher success. These 
include the Henmon-Nelson psychological test, 
the American Council on Education ps y c holog- 
ical test, the Cooperative Reading test, and the 
Cooperative General Culture test. Since the re- 
lation between scores earned on these tests, and 
eventual success in the profession are as low as 
they are, these tests would appear to eliminate 
both potentially successful teachers as well as 
unsuccessful. 

Academic achievement data holds some 
promise for screening of teacher candidates, and 
the standards might be increased in this respect, 
but this will need to be done with care. Earned 
grade point average appears to be the most use- 
ful instrument in this group, and in the entire 
screening program for that matter, for predict- 
ing teacher success. As has been suggested 
earlier, the use of the overall grade point aver- 
age may be broadened to include other devices 
for screening, in addition to raising or lowering 
the minimum as the occasion demands. 

The use of High School Rank for screening 
purposes as far as the data here presented ap- 
pears of doubtful value. Although the correla- 
tion of this device is larger than most, itappears 
low for predictive purposes particularly after a 
preliminary selection has been made onthe basis 
of grade point average. Its use should probably 
be restricted to that of providing supplementary 
-—— use of the Speech Proficiency Test 
should be continued in the screening program, at 
least for certification. It is important Ша teach- 
er candidates be certified for minimum Speech 
attainment necessary for classroom success. 


(Vol. 24 


It is probably not necessary to rate candidates 
above the minimum requirements. 


2. Under what circumstances do se- 
lection devices now employed per- 
mit admission of individuals not 
likely to succeed? 


Of the 189 total group studied, only 24 were 
judged on the basis of the criterion to have 
achieved less than average success. This group 
was obviously permitted to enter training and be- 
come teachers in spite of the screening proced- 
ures employed. Since no follow-up, other than 
the in-service rating, was conductedon the group 
it is not possible to determine why these 24 met 
with limited success. 

Of the total below-average groupsevenwere 
admitted to professional study with earned grade 
point averages well below 1.3. An additional 12 
were admitted with grade point averages between 
1.30 and 1.50. These two sub-groups constitute 
79% of the total below-average group, indicating 
that an academic basis exists for their low rat- 
ing. 

However, 4 in the below-average group had 
earned grade point averages above 2.00. Thus, 
it appears that while the earned grade point ау- 
erage is a valid measure of teaching success, it 
is not sufficient in and of itself. Further, ехрег- 
imental study will be necessary to discover other 
valid factors in teachirg success to be combined 
with Earned Grade Point Average in animproved 
screening program, capable of isolating those 
individuals not likely to succeed in teaching. 


3. Is there basis for raising or lower- 
ing the standards by which candi- 
dates are admitted to pre-service 
training and certification as teach- 
ers? s 


If the minimum grade point average were 
increased from the 1.30 now being employed to 
1.50, the higher minimum would screen out 13 
of the 24 who were judged to be of less than av^ 
erage teaching ability. But at the same time, 
such an increase would eliminate 31 who were 
rated as average, 7 rated above average, and 1 
rated superior. Thus, it becomes evident that 
change of the grade point average minimum will 
not alone be the solution to more adequate screen” 
ing. 


4. At what point in the teacher educa~ 
tion program is the screening for 
teacher education likely to be most 
effective? 


It is apparent that prediction of teacher suc 
cess becomes easier and more accurateas more 


 —F s  ——— n  Ə_—_—o-' rus'Mb<1 >[ *— PD —————————— --. 
- 2 


December, 1955) 


information adout the candidate becomes avai!- 
able. The most accurate predictioncan be made 
at the time of graduation and certification, based 
ona 22% predictive efficiency of the Placement 
Bureau ratings. But since it is important for 
the efficient use of time and facilities to make a 
prediction of success as early as possible, a bal- 
ance must ve effected. Thus, the ideal time 
occurs when the decision to admit or reject is 
made early enough to allow a rejectee a mple 
time to choose a new course without a great loss 
of credits, and late enough to determine tne 
earned grade point average on which a reason- 
able judgment on future success in teaching may 
be based, 

There is no decisive evidence on which se- 
lection of a point of mast effective screening 
may be based. It is possible that the time of 
application for admission to professional study 
may be most effective, subject to further study. 

It must be pointed out here that no adequate 
Screening program will function properly w ith 
only one on-the-spot screening. Such screening 
must 5e supplemented by continuous selection 
Procedures both before and after admission to 
Professional study. These would include active 
Supervision of academic progress andthe course 
ot Study, periodic counseling, a speech profic- 
tency test, a personality test, a physical exam- 
Ination, interview, and standardized tests. A 
Program such as this would be effective because 
И allows time to gather adéquate information 
я Out an individual on which to base admission, 
τ at the same time providing for increasing 
Š tandards of attainment necessary for well-qual- 
ified teachers. 


5. What recommendations, based on 
the findings of the study, can be 
made for improved procedures for 
the selection of candidates for 
teaching? 


teste; It is entirely possible that standardized 
the «^e now available which might be used in 
Come ening program to replace those not now 
n betent for use in screening. The literature 
gives cening of candidates for teacher training 
in ЕЗ evidence of many standardized devices now 
ορ. though there is no conclusive proof of 
17 relation to teacher success. 
be Ж. The use of the Grade Point Average may 
br Toadened to include a specific minimum for 
fessional courses, Since Grade Point Aver- 
саа demonstrated as an effective screening 
ma ice, standards of more intensive preparation 
Y be possible with a device such as this. M 
С. The use of subjective techniques m: 
hi dated to use for screening purposes. Tech 
amic? Such as the group interview, group dyn 
1—5 situations, and observation under social 


STOELTING 131 


pressure are now in use and under study by a 
number of teacher training institutions. While 
their use seems promising, continuous research 
to check on the results is necessary before they 
can be depended upon. 

d. As an aid in the study of experience and 
personal characteristics in a teacher candidate, 
a record system following the individual through 
his four years of preparation for teaching may be 
useful. In teacher training institutions now using 
this device on an experimental basis they find 
that observations going back into secondary 
school make significant contributions in the 
screening of successful teacher candidates. 

e. Further research on the characteristics 
of the successful teacher are needed on how per- 
sonal cultural pattern, philosophy, and system 
of values combine in a successful teacher. 


BIBLIOGRAPHY 


1. Alcorn, M. D. “Тһе Problem of Teacher 
Selection, " Educational Administrationand 
Supervision, XXXIV (March 1948), pp. 160- 
62 : 


2. Almy, Н. C. and Sorenson Н. “А Teacher 
Rating Scale of Determined Reliability and 
Validity, " Educational Administration and 
Supervision, XVI (March 1930), pp. 179-86. 

3. Archer, C. P. ‘‘Personnel Procedures in 
Teacher Training Institutions, " Journal 
of Educational Research, XL (May 1947), 


pp. 672-84. 
4. Barr, A. S. “Тһе Measurement and Pre- 


diction of Teaching Efficiency: A Summary 


of Investigations, " Journal of Experiment- 
al Education, XVI (June 1948), pp. 205-83. 

5. Barr, A. S. and others. ‘‘The Validity of 
Certain Instruments Employed inthe Meas- 
urement of Teaching Ability, "in The Meas- 
urement of Teaching Efficiency (New 
York: Macmillan Co., 1935), pp. 71-141. 

6. Bliss, W. B. “How Much Ability Does a 
Teacher Need?’’ Journal of Educational 
Research, VI (June 1922), pp. 33-41. 

7. Boardman, C. W. Professional Tests as 
Measures of Teaching Efficiency in High 
School, Contributions to Education, No. 
327 (New York: Teachers College, Colum- 
bia University, 1928). 

8. Breckenridge, E. ‘‘A Study of the Relation 
of Preparatory School Records and Intelli- 
gence Test Scores to Teaching Success, ” 
Educational Administration and Supervi- 
sion, XVII (November 1931), pp. 649-60. 

9. Broom, M. E. ‘‘The Predictive Value of 
Three Specified Factors for Success in 
Practice Teaching, " Educational Admin- 


istration and Supervision, XV (September 


132 б JOURNAL OF EXPERIMENTAL EDUCATION 


1929), pp. 25-29. 

10. Engleman, F. E. and Larson, V. M. “бе- 
lective Admission to the Teaching Profes- 
sion," NEA Journal, XXXIX (February 
1950), pp. 94-95. 

11. Flanagan, J. C. ‘‘An Analysis of the Re- 
sults from the First Annual Edition of the 
National Teachers Examinations, '' School 
and Society, LIV (July 26, 1941), pp. 59- 
64. 

12. Flowers, J. G. ‘‘Better Teachers for Our 
Schools, " Peabody Journal of Education, 
XXV (January 1948), pp. 139-141. 

13. Haskew, L. D. ‘‘Selection, Guidance a nd 
Preservice Preparation of Students for 
Public School Teaching, ’’ Review of Edu- 
cational Research, XXII (June 1952), pp. 
175-181. 

14. Henrikson, E. H. ‘‘Comparisons of Ratings 
of Voice and Teaching Ability, ” Journal of 
Educational Psychology, LIV (February 
1943), pp. 121-123. 

15. Jacobs, C. L. The Relation of a Teacher's 
Education to Her Effectiveness, Contribu- 
tions to Education, No. 277 (New York: 
Teachers College, Columbia University, 
1922). 

16. Kirkpatrick, F. H. Helping Students Find 
Employment, American Council on Educa- 
tion Studies, Series VI, Student Personnel 
Work Νο. 12 (Washington, D.C.: Ameri- 
can Council on Education, 1949). 

17. Knight, F. B. Qualities Related to Success 
in Teaching, Contributions to Education, 
No. 120 (New York: Teachers College, 
Columbia University, 1922). 

18. Kriner, Н. L. “Five-Year Study of Teach- 
er's College Admissions, ’’ Educational 
Administration and Supervision, XXIII 
(March 1937), pp. 192-199. 

19. LaDuke, C. V. ‘‘The Measurement of 
Teaching Ability, ” Journal of Experiment- 
al Education, XIV (September 1945), pp. 
75-100. 

20. Lins, L. J. ‘‘The Prediction of Teaching 
Efficiency, " Journal of Experimental Ed- 
ucation, XV (September 1946), pp. 2-60. 

21. McCoard, W. B. ‘‘Speech Factors as Re- 
lated to Teaching Efficiency, " Speech Mon- 
ograph, XI (January 1944), pp. 55-64. 


(Vol. 24 


22. Norris, E. H. Personal Traits and Success 
in Teaching, Contributions to Education, 
No. 342 (New York: Teachers College, 
Columbia University, 1929). 

23. Norris, В. S. and Phillipson, Н. “Тһе De- 
velopment of Research: Selectionand Train- 
ing of Teachers, ” Times Educational Sup- 
plement, 1630 (July 27, 1946), p.351. 

24. Peters, C. C., and Van Voorhis, W.R. 
Statistical Procedures and Their Mathe- 
matical Bases (New York: McGraw-Hill 


Book Co. , 1940). 

25. Retan, G. A. "Emotional Instability and 
Teaching Success, ”” Journal of Education- 
al Research, XXXV (October 1943), pp. 
135-141. 

26. Rostker, L. E. ‘‘The Measurement of 
Teaching Ability, ” Journal of Experiment- 
al Education, XIV (September 1945), pp. 
52-74, 


27. Seagoe, M. V. ‘Standardized Tests in the 
Pre-Training Selection of Teachers," 
Journal of Educational Research, XXXVI 
(May 1943), pp. 678-693. 

28. Smith, Н. P. “Тһе Selection of Students for 
the Profession of Teaching, " School and 
Society, LXV (March 8, 1947), pp. 169- 
171. 

29. Somers, G. T. ‘Pedagogical Prognosis, ” 
Predicting the Success of Prospective 
Teachers, Contributions to Education, No. 
140 (New York: Teacher College, Colum- 
bia University, 1923). 

30. Stiles, L. J. “Recruitment and Selection 
of Prospective High School Teachers by Оп» 
iversities, " Educational Administration 
and Supervision, XXXII (February 1946), 
pp. 117-121. 

31. Stuit, D. B. '*Scholarship as a Factor іп 
Teaching Success, ” School and Society, 
XLVI (September 1937), pp. 382-384. 

32. West, R. L. “Тһе Operation of a Selective 
Admissions Program in a Teachers Col- 
lege, " Educational Record, XXX (April 
1949), pp. 137-147. 

33. White, V. ‘Selection of Prospective Teach- 
ers at Syracuse University," Journal of 
Teacher Education, I (March 1950), pp. 

"Bl. 


DIFFERENTIAL METHODS OF SOLVING SE- 
LECTED PROBLEMS ON THE ACE PSY- 
CHOLOGICAL EXAMINATION’ 


LEONE ANDERSON, RICHARD RANKIN, JOY RICHARDSON, 
JULIUS SASSENRATH, JULIUS THOMAS 
University of California at Berkeley 


TWO EXPERIMENTERS using eye movement 
records have attempted to uncover differential 
Problem solving methods employed by highand 
low performers. Anselmo (2), using Number 
Series problems, and Greening (4), employing 
тыште Analogies problems, found that high per- 
th mers took less time-than low performers in 
e Solution of the problems. Both experiment- 
уе Concluded that the average duration of fixa- 
Š n pauses was slightly less for high perform- 
ni but the difference in methods of attac king 
"a problems remained undetected. In aneffort 
το ΤΡ ON and expand these earlier investiga- 
Е ns, ап attempt is made to analyze more thor- 

ughly the eye movement data, and also to ob- 
5 іп verbal recordings of subjects during their 
Olution of the problems. 


Problem 


The experimenters seek to disclose the dif- 
ential problem solving processes employed 
Pt and low performers, respectively, inthe 
Proble of Number Series and Figure Analogies 
Hau ти from the ACE Psychological Examina- 
inve 1). The following specific questions were 
Stigated: 
а. Do high performers exhibit fewer fixa- 
Ns and regressions than low performers? 
of fi, Do high performers have a total duration 
is] xations and regressions, respectively, which 
ess than the low performers? 
tern” Do high performers fixate more on the pat- 
they of the problem than on the options; i.e., do 
ion establish the pattern before selecting an op- 
ы an answer, whereas do low performers 
an equal number of fixations on the pattern 
Options of the problem? 
Pioba high performers on Number Series 
ers fr ms shift more readily than low perform- 
Solve 9m one arithmetic process required to 
quent ἃ problem to another process for a subse- 
с Problem; i.e., shifting from addition toa 


vision ation of addition, multiplication, and di- 


fer 


tio 


% Thi 
ot experiment was condvcted under 
Yehology Laboratory at the Univers 


тау д i 
udograph, Hartford, Connecticut. 


Methodology 


The ACE Psychological Examination, in ac- 
cordance with the instructions (1) was adminis- 
tered to two classes in Introductory Educational 
Psychology. From this population of 220 stu- 
dents those who missed less than six or more 
than 19 of the 30 problems on the ACE Number 
Series Sub-test were defined as the high and low 
performers, respectively, on Number Series. 
Those students who missed less than eight or 
more than 19 of the 30 problems on the ACE Fig- 
ure Analogies Sub-test were defined as the high 
and low performers, respectively, on Figure 
Analogies. This method of selecting extreme 
performers ultimately provided the following 
number of subjects in the four groups: low Num- 
ber Series (N = 5); high Number Series (N = 9); 
low Figure Analogies (N = 6); and high Figure 
Analogies (N = 11). 

Eye movements were photographed by a cor- 
neal reflection type camera. The developed film 
was projected on a replication of the problems 
and the data were then tabulated. A detailed de- 
scription of the camera and procedure is given 
by Gilbert (3). 

A disc audograph** was used to record the 
subjects’ verbalizations of the step by step pro- 
gression of what they perceived and thought in 
their attempt to solve the problems. Data were 
compiled from the verbal recordings by develop- 
ing a rating scale with dichotomous ratings 
(+ or -). 

The following 11 items comprised the rating 
scale used in evaluating verbal responses to the 
Figure Analogies problems: 

Identifies similarities only 

Identifies differences only 

Identifies similarities and differences 
Uses mathematical concepts 

Proceeds from an incomplete and/or inac- 
curate recognition of constants (those as- 
pects of the figure which remain the same) 
and variables (those aspects of the figure 
which change) 


с> омон 


the guidance of Professor Luther C. Gilbert, in the Educational 
ity of California, Berkeley. 


134 JOURNAL OF EXPERIMENTAL EDUCATION 


6. Develops an answer only through examina- 
tion of the rule; then proceeds to options 
(Solution by Analysis) 

7. Develops an answer by elimination of op- 
tions (Solution by Elimination) 

8. Does not apply relationship once having de- 
scribed it 

9. Answers with correct solution, and (а) cor- 
rects error, (b) does not correct error, 
(c) makes no error Ν 

10. Answers with incorrect solution, although 
(a) corrects error, (b) does not correct 
error 

11. Presents no solution 


The following 13 items comprised the rating 
scale used in evaluating verbal responses to the 
Number Series problems: 

1. Identifies numbers only 

2. Identifies arithmetic relationship only 

3. Identifies numbers and arithmetic relation- 
ships 

4. Proceeds by insightful technique; i.e., 
notes rules after verbalizing only one to 
four numbers and/or relationships; then 
answers 

5. Proceeds by repetitious technique; i.e., 
(a) notes five to seven numbers and/or re- 
lationships; then answers, (b) re-exam- 
ines the numbers and/or relationships al- 
ready verbalized; then answers 

6. Develops an answer only through examina- 
tion of the pattern; then proceeds to op- 
tions (Solution by Analysis) 

7. Develops an answer by elimination of op- 
tions (Solution by Elimination) 

8. On first examination proceeds from an in- 
complete and/or inaccurate recognition of 
the rule 

9. On first examination notes new arithmetic 
relationships not present in preceding prob- 
lems 

10. Requires more than one examination to 
note new arithmetic relationships not pres- 
ent in preceding problems 

11. Answers with correct solution and (a) cor- 
rects error, (b) does not correct error, 
(c) makes no error 

12. Answers with incorrect solution, although 
(a) corrects error, (b) does not correcter- 
ror 

13. Presents no solution 


The problems to be used with the eye move- 
ment camera and verbal recorder were chosen 
from different editions of the ACE in order to 
minimize the effect of practice. The nine prob- 
lems to be solved before the camera were select- 
ed by an jtem-difficulty analysis of the Number 
Series and Figure Analogies Sub-tests of the 
1941 edition of the ACE. Assuming that the 


(Vol. 24 


same problems in the 1940 edition of the ACE 
had the respective degree of difficulty, problems 
were selected from this edition for use with the 
verbal recorder. For purposes of analysis the 
individual problems were divided into two parts: 
(a) the pattern, i.e., the initial part of the prob- 
lem establishing the rule, and (b) the options, 
i.e., the multiple choice selection of answers. 
The problems, shown at top of next page, were 
chosen and classified as easy, medium, or diffi- 
cult. 

The laboratory procedure was introduced to 
each subject with a brief explanation of the pur- 
pose of the experiment and the principles of the 
camera and audograph. Each examinee was then 
(a) adjusted before the camera, (b) given instruc- 
tions adapted from the ACE, (c) asked to solve 
two practice problems, and (d) individually pre- 
sented the nine problems as an untimedtest. Up- 
on completing the camera procedure the subject 
was (a) seated before the audograph, (b) given 
instructions adapted from the ACE, (c) present- 
ed two practice problems to be solved verbally, 
and (d) administered the problems as an untimed 
test, to which the subject responded by verbaliz- 
ing though: processes in attempting a solution. 


Discussion of Results 


From Table I it can be seen that the mean 
number of fixations and regressions for Number 
Series problems is less for high performers than 
low performers. This appears to differ with Απ΄ 
selmo’s (1) findings, but may be accounted for іп 
the more stringent selection of subjects and prob- 
lems for this experiment. Mean total duration of 
fixations and regressions for Number Series ргор- 
lems also indicates that the high performers 
spend less time than low performers. However, 
the mean number of correct answers for Number 
Series problems is not very different for the low 
and high performers. Thus without controlling 
the time variable, the two groups performed 
equally well. This was not true when these sub- 
jects were administered the ACE as a timed test 
from which they were selected as high and low 
performers, respectively. 

Different results emerge (Table I) for the two 
groups tested on Figure Analogies problems. Ex- 
cept for mean number of correct answers, there 
appears to be little or no difference between any 
of the measures. This lack of differences be- 
tween high and low performers is contrary to 
Greening's (4) conclusions. The apparent dif- 
ference between mean number of correct ans- 
wers indicates that even on an untimed basis the 
high performers are superior. Thus the Figure 
Analogies problems tended to function as a pow- 
er test, while this was not true of the Number 
Series problems. The design of the Number Ser- 
ies and Figure Analogies problems does not, of 


December, 1955) 


(Camera) Number Series (ACE 1941) 2, 
(Camera) Figure Analogies (ACE 1941) 2, 
(Audograph) Number Series (ACE 1940) 2 
(Audograph) Figure Analogies (ACE 1940) 2 


course, permit a comparison of the data on 
these two types of problems. 

Figure 1 indicates that for the three levels of 
difficulty of the Number Series and Figure Anal- 
Ogies problems, the high performers fixated 
more on the pattern than did the low performers. 
Moreover, both groups fixated more on the pat- 
tern than on the options of all the problems with 
the exception of the low performers on Figure 
Analogies. 

As the Number Series and Figure Analogies 
problems increased in level of difficulty, from 
easy to medium, both the high and low perform- 
ers spent a similarly greater percentage of fix- 
ations establishing the pattern of the problem. 
Specifically, ХЕРА, approximates АЕА, and 
ANS approximates ANS, (Figure 1). However, 
with those problems which increased from a 
Medium to a difficult level, the low performers 
fixated less on the pattern, thus relying moreon 
an eliminative method selecting an answer from 
the options. In contrast, the high performersin 
their solution of the difficult problems increased 
the number of their fixatioiis оп ће pattern, thus 
indicating a more analytic method of problem 
Solving. Here the differential between ANS; 
an is less than that between УЕА, and 

з. 

ре esumably, the effectiveness of the high 

apes te method of problem solving is not 
levels only reduced by the increasing difficulty 
it dem 9f those problems administered. Rather 
sity of a Strates a proportionately greater inten- 
a func qi lysis. (Intensity is here taken Е 
Pattern F of the percentage of fixations on e 


mae Performers, however, in exhibiting a 
for the um percentage of fixations on the pattern 
toa ie medium level problems and a regression 
арро Ser percentage for the difficult problems, 
Cha ar to execute a problem solving method 
res Cterized by a simple integration of past 
Probl. ience and the immediate solution of the 
tens 4. Their method seemingly cannot be in- 

sified beyond the medium level prob lems. 
m S regressive nature of the low pe rformers’ 
με. in comparison with the progressive na- 
ae OE the high performers’ method may be 3 
ignificant differential for discriminating p r Ob- 
em solving attacks. 

The percentages of fixations on the patterns 

and options were further analyzed by computing 

© mean percentages of consecutive fixations. 


ANDERSON ET AL. 


135 


Medium Difficult 
16, 17, 18 28,29,30 
16, 20,21 28, 29, 30 
16, 17, 18 28, 29, 30 
16, 20,21 28, 29, 30 


The first set of consecutive fixations on the pat- 
tern may indicate on the subjects' first attempt 
to establish the rule. Whereas, the first set of 
consecutive fixations on the options may indicate 
the subjects first attempt to finda solution 
among the alternatives. The second and third 
sets of consective fixations may be second and 
third attempts to establish the rule or a verifica- 
tion of the first answer. These data were com- 
piled in Table II. 

On Number Series for high and low perform- 
ers in the first examination the mean percentage 
of consecutive fixations on the patterns (P) was 
four to five times more thanon the options (O). In 
the second examination both high and low perform- 
ers fixated consecutively for the average percent- 
age that was two to three times more on the pat- 
terns than on the options. While in the third ex- 
amination, both high and low performers fixated 
consecutively for a more nearly equal mean per- 
centage on patterns and options. High perform- 
ers in the first and second examination fixated 
consecutively on the patterns for about as great 
a mean percentage of the total number of fixa- 
tions as the low performers. But in the third ex- 
amination of the patterns the low performers 
fixated consecutively for a mean percentage that 
was three times greater than that of the high 
performers. Similarly, in the first and second 
examinations of the options the high performers 
fixated consecutively for about as great a mean 
percentage as the low performers. Again, in 
the third examination of the options, the low 
performers fixated consecutively for a meanper- 
centage of the total number of fixations that was 
seven times larger than the high performers’ 
mean percentage. Therefore, in summationthe 
high performers may be said to progress with 
a more thorough pattern analysis, as indicated 
by the greater mean percentage of consecutive 
fixations; i.e., the greater mean percentage of 
fixations on the pattern implieda more complete 
observation. The low performers appeared to 
be satisfied with a less complete analysis, and 
they looked to the options for clues to the an- 
swer;i.e., looking for a specific answer among 
the options would not require as many fixations 
as attempting to eliminate four options. Since 
in the third examination of the patterns the low 
performers fixated consecutively for a mean 
percentage of the total fixations that was three 
times larger than the high performers percent- 
age, two different processes may have been oc- 


136 Y JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLEI 


EYE MOVEMENT MEASURES FOR LOW AND HIGH PERFORMERS ON NUMBER 
SERIES AND FIGURE ANALOGIES 


Number Series Figure Analogies 

Low High Low High 

Mean No. Fixations per Problem 37 21 21 18 

Mean No. Regressions per Problem 29 8 9 7 

TotalDur. Fixation 1/30 Sec. 444 287 244 195 

Total Dur. Regression 1/30 Sec. 186 121 98 80 
Mean No. Correct Answers 5.4 5.9 4.5 6.2 


TABLE II 


MEAN PERCENTAGE OF CONSECUTIVE FIXATIONS ON PATTERNS AND OPTIONS 


Number Series Figure Analogies 
Mean Percent- Dow High Eow High 
age of Consec- 
utive Fixations P% 0% P% 0% P% 0% P% 0% 
I Examination 45.0 10.3 60. 6 12.0 24.4 21.1 39.9 25.5 
П Examination 17.3 7.5 17.0 6.1 10.5 13.4 10.2 9.5 


Ш Examination 7.9 7.0 2.5 1.1 7.0 1.4 5.4 3.1 


December, 1955) ANDERSON ET AL. 137 


85 
80 
75 
ANS, 
70 
65 
L 
c 
c 
© 
2 
= 
o 
а. 
5 60 
a 
c 
o 
2 
о 
z£ 
u 
а 55 
E 
е АҒА, 
5 
Ж 50 
2% 
о 
- 
vo 
а 
45 
40 Easy Medium Difficult 
Levels of Problem Difficulty 
Figure 1. Percent of Total Fixations on the Pattern of the 


Problems as a Function of the Level of Difficulty 


TABLE III 


MEAN FREQUENCY OF ASCRIP TION OF RATING SCALE ITEMS DISCRIMINATING 
HIGH AND LOW PERFORMERS 


τοσο M ——— з T á.  T——Ə— OÍbÑ—.— 
Intervals for 


Mean Fre. of Rating Items of a Differential Mean Frequency for: 
Ascription of 

Items Low F.A. High F.A. Low N.S. High N.S. 
0- 3.9 2 (3. 0) 2 (2.8) 
4- 6.9 9c (6. 4) 2 (5.1), 3 (4.1) 2 (5. 5) 
7 - 10 3 (7.0) Әс (7.1) 


лл. 


ЗЕТ 


копуопая тушмяитняахя яо тумнпог 


PZ TOA) 


fe 
Was found to be greater for the lowandhigh per- 


December, 1955) 


curring. The low performers appeared to be 
still searching for the rule; the high performers 
may have been verifying or correcting minor 
errors. 

On Figure Analogies low performers fixated 
consecutively in all three examinations on the 
patterns for the same mean percentage as on 
the options. Contrastingly, the high perform- 
€rs scored a greater mean percentage of con- 
Secutive fixations on the patterns than on the op- 
tions in the first examination, yet in the second 
and third examinations, they scored analmost 
equal mean percentage on the patterns and on 
the options, therein duplicating in part the re- 
Sults of the low performers. Similar to the pro- 
cesses suggested by the Number Series data, the 
high performers’ first examination of the Figure 
Analogies may be interpreted as a more thor - 
Ough pattern analysis than the low performers’. 
But there is a contrast of possible significance 
between the distribution of time in the third ex- 
amination for the Figure Analogies and Number 
Series data; i. e., the difference between the 
mean percentage of consecutive fixations on the 
Patterns of high and low performers was not as 
i m as the same differential in the Number 
.. les data, Moreover the same differential in 

e options in the Figure Analogies data is less 

n in the Number Series data. 

In rating the verbal responses, the mean fre- 
шу of the ascription of items to the four groups 
P = computed, Specifically, the mean frequen- 
Шо zero to three, four to six, andseventoten 
ы Ч51уе, represented infrequent, moderately 

io quent, and very frequent respective ascrip- 
With; From this analysis those items rated 
ihr a mean frequency different (in terms of the 
LU categories above) for highandlow perform- 
Ване Figure Analogies were found to be the fol- 
liter ον (item 2) identifying differences only, 
ης 3) identifying similarities and differences, 
io (item 9c) answering with the correct solu- 
pum no error. 
then ata for items twoand three (Table II) streng- 
the evaluation (from percentage of fixation 
ава Іп Figure 1) of the high performers’ method 
diffe ytic, wherein they tend to examine only 
есе “565. Notation of similarities would be 

S effective and more divertive. This indicates 
mei er purposefulness in the high performers' 

hod of problem solving than in that of the low 

“riormers. 
fhe finding that the item 9c, answering with 
à COrrect solution with no errors, discriminat- 
' Only between the low and high performers on 
Ἂ Sure Analogies is corroborated by the eye 
ovement data (Table I). From these, the dif- 
“ential in mean numbers of correct answers 


thers on Figure Analogies than on Number 
ries, This indicates a power differential for 


ANDERSON ET AL. 


139 


the low and high performers. 

The finding that there were few rating items 
with a differential mean frequency for low and 
high performers, as yielded by this scale, may 
be attributable to the time factor in the admin- 
istration of the test from which the selection of 
the subjects was made. For example, the effects 
of these variable factors may have largely con- 
tributed to concealing the low performers’ solu- 
tion as characteristically eliminative and that of 
the high performers as characteristically ana- 


lytic. 
Summary and Conclusions 


This experiment was conducted to disclose the 
differential problem solving processes employed 
by high and low performers, respectively, inthe 
solution of Number Series and Figure Analogies 
problems from the ACE Psychological Examina- 
tion. A sample of 31 college students, from two 
classes in Introductory Educational Psychology, 
who had scored less than six or more than 19 
Number Series items, or less than eight or 
more than 19 Figure Analogies items on either 
of the above mentioned sub-tests in which their 
performance was extreme. This examination 
consisted of individual, laboratory testing, first 
before the eye movement, andsecond with the use 
of an audograph for verbal recording. Eye move- 
ment measures were compiled from the devel- 
oped film, and the verbal recordings were eval- 
uated on a dichotomous rating scale. 

Relevant to the questions investigated, the 
results suggest the following conclusions: 


1. High performers exhibit fewer fixations 
and regressions than low performers in solving 
Number Series problems. High andlowperform- 
ers employed a similar number of fixations and 
regressions in Figure Analogies problems. 

2. High performers have a total duration of 
fixations and regressions which is less than the 
low performers’ in solving Number Series prob- 
lems. For Figure Analogies problems the dura- 
tion of fixations and regressions is not very dif- 
ferent for the two groups. 

3. The high performers fixated more on the 
pattern than on the options for all problems of 
the three levels of difficulty. Moreover, their 
percentage of fixations on the pattern increased 
with increasing difficulty level of the patterns. 
For low performers, the former finding was 
true only on the Number Series problems. The 
percentage of their fixations on the pattern in- 
creased only from the easy to the medium level 
problems then decreased for the difficult prob- 
lems. This was indicated in their performance 
on both Figure Analogies and Number Series 
problems. 

4. There was no substantiation in either the 


140 ° JOURNAL OF EXPERIMENTAL EDUCATION 


eye movement or verbal-recorded data that 
high performers shift more readily from one 
arithmetic process required to solve a problem 
to another process for a subsequent problem. 


The verification of the questions under inves- 
tigation may have been limited by the assump- 
tion that eye movement responses are in par t 
symptomatic of thought processes. This may 
not be true. Rather thought processes may be 
a ‘‘delayed-reaction-expression”’ of the eye 
movement response (Thought I is concurrent in 
time with Fixation 2); і.е., the eye move ment 
versus the thought response may be analogous 
to the eye movement versus the voice span rec- 
ord. 


REFERENCES 


(Vol. 24 
1. American Council on Education Psychological 
Examination (Washington, D. C.: American 


Councilon Education, 744 Jackson Place). 

2. Anselmo, V. J. An Eye Movement Study of Num- 
ber Series Completions, Unpublished Mas- 
ter's Thesis, University of California, 1940. 

3. Gilbert, L. C. “Ап Experimental Investiga- 
tion of Eye Movements in Learning to Spell 
Words," Psychological Monographs, XLII 
(1932), pp. 1-81. 

4. Greening, C. P. Differential Factors in the 
Solution of Figure Analogies Problems by 
High and Low Achieving Individuals, Unpub- 
lished Master's Thesis, University of Cali- 
fornia, 1948. 


ACADEMIC ATTRITION OF ENGINEERING - 
TRANSFER STUDENTS 


J. STANLEY AHMANN 
Cornell University 


PERIODICALLY some of the careful observ- 
of the higher education scene express in- 
x easing concern about the relatively high rates 
г academic attrition found at most institutions. 
jon of any benefits which a student may re- 
ba n from a contact witha college or university, 
is ыс little аза single semester, the opinion 
of m that the failure of sizable percentages 
бақа art to graduate has resulted in an unnec- 

A Y dissipation of energies and finances. 
тас pe ru a of these efforts possible, 
EUM iable gains to both the institutions and the 

rios Concerned are envisioned. 
8 ας to the problem would be, of course, 
ing еШ screening of applicants reque st- 
ta ἔπη сезем to engineering curricula. Efforts 
lated io е characteristics which are highly re- 
en eene success in engineering have 
of the MID cole (8. The predictive usefulness 
läste actitud ool grade-point average (9), scho- 
tests (10) ude tests (2,5, 11), other aptitude 
and а tests (3), interest tests (7), 
combinatio ity scales (6), individually and in 
але lon, has been investigated. In many in- 
5 the results have been promising, even 


о z 
ПОцей incapable of offering a near perfect selec- 


tion scheme, 


ы Case of engineering colleges in which 
students umbers of students enter as transfer 
ising stud the problem of selecting the most prom- 
Owa Stat ents is further complicated. At the 
een m ка College, for example, estimates have 
tering ade that as many as 40 percent of the en- 
fall leer ens students at the beginning of a 
aee had received college credit from other 
ç pde of higher education. A study (1) of 
Di isin students entering the Engineering 
; and of this college during the 1946-47, 1947- 
Most st 1948-49 academic years revealed that 
Prior ¢ udents (80%) had attended only one college 
that t enrolling at the Iowa State College, and 
П adja College was usually located in Iowa or 
Students state and enrolled less than 2500 
Cludeg $. Furthermore, of the £04 students in- 
edin in the study, only 246, or 31%, graduat- 
faileg eineering. The remaining 556 either 
at ет transferred to non-engineering curricula 
institutes State College, transferred to other 
їе ions of higher education, ог dropped from 
ven th for miscellaneous personal reasons. 
emic, Ough a few of the 558 may have been aca- 
pruically successful elsewhere, they can ре 
Perly classified as attrition students in the 


siz 


insti 
the 


eyes of the engineering faculty. 

Although the foregoing study investigated the 
relationships between a series of numerical var- 
iables and the tendency to graduate in engineer- 
ing, no attempt was made to study the possible 
influence of non-numerical characteristics on 
this criterion. An extension of the study, there- 
fore, seemed in order. 

On the basis of a preliminary examination of 
the data available, one of the non-numerical fac- 
tors which seemed to warrant examination was 
the type of institution first attended by the trans- 
fer student. Although this factor was but one of 
many potentially influencial factors, indications 
were found that it was possibly more influential 
than most. Therefore, the following report is 
restricted for the most part to the single consid- 
eration of whether the type of college at which 
a transfer student first matriculated affected his 
tendency to graduate in engineering at the Iowa 
State College. 

For purposes of classification, the engineer- 
ing transfer students were considered to have 
matriculated for the first time at one of two dif- 
ferent types of institution, either one offering 
only a two-year program or one offering more 
than a two-year program. The hypothesis was 
then posed as to whether, with respect to trans- 
fer students entering engineering curricula atthe 
Iowa State College, those who first matriculated 
at institutions offering only a two-year program 
differed from those who first matriculated at in- 
stitutions offering more than a two-year pro- 
gram in terms of tendency to graduate in engin- 
eering. 

A random sample of 256 male enginee ring 
transfer students was selected from the 804 stu- 
dents included in the original study. This sample 
was so drawn that students having matriculated 
at both types of institutions were equally repre- 
sented. Furthermore, since earlier research (4) 
demonstrated that students who were veterans of 
World War II tended to surpass non-veteran stu- 
dents in academic achievement, the sample was 
further sub-divided on that basis, thus yielding 
four subgroups with 64 cases included in each 
subgroup. In Table I is shown the number of 
students in each subgroup who graduated in en- 
gineering. 

Inspection of this table revealed that, when 
individual differences in academic aptitude were 
ignored, sizable differences in tendency tograd- 
uate existed. The students first matriculating 


TABLE I 


TENDENCY TO GRADUATE IN ENGINEERING OF 256 TRANSFER STUDENTS 


NENNEN nc 
Veteran Status 


Type of Yes No Total 

College Grad. Attrition Grad. Attrition Grad. Attrition 
Two-Year k 19 45 12 52 31 97 
Program Only % 29.7 70. 3 18.8 81.2 24. 2 15.8 
л k 28 36 17 47 45 83 
ан % 43.8 56. 2 26.6 13.4 35.2 64.8 

k 41 81 29 99 16 180 

Total 

а б, 36.7 63.3 22.7 17.3 29.7 10.3 


[441 


кошуопая тушимяипнчахя AO тукчпог 


vc ТОЛ) 


December, 1955) 


at an institution with more than a two-year pro- 
gram seemed to graduate in distinctly greater 

numbers than those who first matriculated at in- 
stitutions offering only a two-year program. Al- 
SO, veteran students obviously surpassed non- 
veteran students with respect to this criterion. 
Of the four subgroups, the veteran students first 
matriculating at institutions offering more than 

a two-year program seemed definitely to excell. 

To test the significance of the differences in 
tendency to graduate in engineering, an analysis 
^ Variance can be computed provided an assump- 

10n 15 made concerning the nature of the gradu- 
ation-attrition dichotomy. In this case, the as- 
sumption was made that the tendency tograduate 
in engineering was a single normally distributed 
Variable and was no more sensitively measured 
than by the graduation-attrition classification. 
This assumption does, therefore, underlie allof 
the procedures and interpretations made in the ` 
following paragraphs. 

The steps followed in the computation of the 
analysis of variance were the same as those re- 
ported by Wert, Neidt, and Ahmann (12). Ac- 
cording to this procedure sums of squares were 
found for the main effects of type of college and 
veterans status as well as for the interaction by 
designating as the value to be assigned to each 


member of the graduation groups, and а as the 


value to be assigned to each member of the at- 
trition groups. The quantities p and qare the 
Proportions of the total sample of 256 students 
who graduated and did not graduate in engineer- 
ing respectively. The value z is the height of 
the ordinate dividing the normal curve of unit 
area in p and qparts. The entries in the analy- 
Sis Οἱ variance table were then found in some- 
What the same manner as in the problems in 
Which a numerical criterion is present. The re- 
Sults are summarized in Table II. 
бы F-value for the type of college main ef- 
by апей to meet significance at the 5% level 
ίσες very slight amount. The conclusion, there- 
bility was considered to be in doubt. The possi- 
fps: remained that those transfer students who 
е Matriculated at institutions offering only a 

-year program did, as a group, experience 
Breater difficulty in graduating because of that 
эү | In the case of the remaining two F-values, 

е Significance of that for the veteran status 
ae effect and the non-significance of the value 
Or the interaction were not surprising. 

In the foregoing analysis any individual dif - 
ferences in studentship which might have influ- 
епсед tendency to graduate in engineering on the 
Paxt of transfer students have been ignored. То 
Investigate the possible influence of type of col- 
lege on transfer students! tendency to graduate 
in engineering, an analysis corresponding close- 
ly to the analysis of covariance was needed in 


AHMANN 


143 


which individual differences in studentship were 
controlled. i 

The quantitative raw scores on the American 
Council on Education Psychological Examination 
and the high school grade-point averages were 
available for all students and were used as indi- 
cators of studentship. The latter values were 
tabulated on an A, B, C, D, and F basis, and 
then converted to a 4, 3, 2, 1, and 0 basis. The 
mean values of both variables are shown in Table 
πι. 

In all four subgroups the difference between 
the graduation group and the corresponding at- 
trition group was striking with respect to caliber 
of studentship as represented by these two vari- 
ables. Differences between the means of the 
quantitative scores were oftenas greatas 10 points 
and once almost 20 points. Differences between 
the means of the high school grade-point aver- 
ages were usually 0.2 or 0.3. In every instance 
the graduation group surpassed the attrition 
group. 

Of additional importance, even though not in- 
cluded as such in Table III, was the fact that, as 
a group, the transfer student representing the 
one type of college differed from those repre- 
senting the other in the following manner. The 
mean quantitative score and mean high school 
grade-point average for the transfer students 
first matriculated at an institution with only a 
two-year program were 61.6 and 2. 62 respec- 
tively. The corresponding values for the trans- 
fer students who first matriculatedat institutions 
offering more than atwo-yearprogram were 64.4 
and 2.68. In terms of these two variables, there- 
fore, the institutions offering the longer program 
tended to attract the better students. 

In order to control on the individual differ- 
ences in studentship as represented by these two 
measures, the analysis of variance shown in 
Table II was expanded into a variation of the or- 
dinary analysis of covariance. This variation, 
although much the same as the original analysis 
of covariance, employed modified discriminant 
functions (12) in place of the regression equa- 
tions. The discriminant functions were of the 
same number and type as the regression equa- 
tions used in covariance analysis and served 
much the same function. 

The results of the analysis are shownin Table 
IV. It should be noted that the proportions of th 
individual differences in graduation tendency 
that could be explained by variations in the gaan- 
titative scores and the high school grade-point 
averages were computed, and were then ex- 
pressed as the proportion of the varisnce rep- 
resenting individual differences in graduation 
tendency not associated with variations in the 
two numerical variables. The resulting values 
were 0.8458, 0.8588, 0.8508, and 0.8510. With 
these known proportions, it was possible to re- 


TABLE II 


ANALYSIS OF VARIANCE OF TENDENCY TO GRADUATE IN ENGINEERING 


| MBQ£I$KII@IéÇIçÉII I Š“ I< IÉIIII9II€%IIIIIIOIIƏI€IIIIII€IŠ"%I$ŠRI9II€IIIIIA£IZIIÉIII£IM9II£I9I— —M——————————— 


Source oí Degrees of Sum of Mean 
Variation Freedom Squares Squares Е 
Lais uu ee rgo RETI - MEME M LA, LLL omnes ae 
Type of College 1 2. 1026 2. 1026 3. 65 
Veteran Status 1 3. 4747 3.4747 6. 04 
Interaction 1 0.1727 0.1721 0. 30 
Within 255 146.7524 0.5755 
ν.μ: Ë AA [na 
TABLE III 


MEANS OF QUANTITATIVE FAW SCORES (X1) AND HIGH SCHOOL GRADE-POINT AVERAGE (Χο) 


---ο απ ἜῆἜἝΩἜῆἜῆἜἝἤἜἝὝ ἲἝ)ἝὝ πο аа 


Veteran Status 


Type of Yes No Total 
College Grad. Attrition Grad. Attrition Grad. Attrition 
ΛΑ pl μμ κών жосын νο μωρο "ος 
Two-Year Χι 72.9 53,1 75.1 61.2 та, ? 57.7 
Program Χο 2.19 2.46 . 3.20 2,85 2.95 2. 51 
Only k 19 45 12 52 31 97 
More Than X1 71.6 60. 5 66. 6 62.4 69. 7 61.6 
Two-Year Χο 9. 86 2.52 2.88 2. 64 2. 86 2. 58 
Program k 28 36 17 47 45 83 

X, 72.1 56. 7 70.1 61.8 71.4 59.5 
Total Хо 2.83 2.49 3.01 2.59 2.90 2.55 

N 47 81 29 99 76 180 
a σσ 


vr 


NOLLVONGA тучмяишнчахя JO тукапог 


vc 10A) 


| 


TABLE IV 


ANALYSIS OF COVARIANCE OF TENDENCY TO GRADUATE IN ENGINEERING 


— ——————— nm—rn—,- rT q-.... .——o— IsysƏ>A  ::YQ€:——-..--. 


‘‘Within’’ Plus Source of Variation Source of Variation Alone 
Source of Unadjusted ж i Adjusted 
Variation it 858. Associated üt  && d. f. M. 8. F 
Type of College 256 148. 8550 0. 8458 | 254 125.9016 1 1. 0153 2.06 
Veteran Status 256 150. 2271 0. 8588 254 129.0150 1 4.1287 8. 36 
Interaction 256 146.9251 0.8508 254 125. 0039 1 0. 1176 0.24 
———,rt m a IE i ы даа 
Within Alone 255 146. 7524 0.8510 253 124. 8863 0.4936 


(9461 '1equieoeq 


NNVWHYV 


ӘРІ 


М 


146 ? JOURNAL OF EXPERIMENTAL EDUCATION 


turn to the information assembled in the analy- 
sis of variance shown in Table Il, and remove, 
as in the common analysis of covariance, any 
allowance which need be made because of indi- 

vidual differences between the groups on the 

control factors. The adjusted sums of squares 
were converted to mean squares and the F-val- 
ues computed in the usual manner. 

The F-value for the type of college main ef- 
fect failed to reach the 5% level of significance. 
Therefore, insofar as the quantitative scores 
and high school grade-point averages controlled 
individual differences in studentship, and no 
other factors contributed a bias, no significant 
differences have been found in tendency to grad- 
uate in engineering at the Iowa State College be- 
tween transfer students first matriculating atan 
institution offering only a two-year program and 
those transfer students first matriculating at in- 
stitutions offering more than a two-year pro- 
gram. It was concluded that the possibility of 
matriculating at an institution offering a broad- 
er program enhanced a transfer student's tend- 
ency to graduate in engineering, as suggested in 
the analysis in Table II, disappeared when indi- 
vidual differences in studentship were consid- 
ered. As suggested in an earlier paragraph, it 
can be inferred that it was not the type of pro- 
gram offered as such which caused transfer stu- 
dents from two-year programs to tend to have 
greater difficulty in graduating in engineering, 
but rather that such institutions seemed to 
enroll less talented students in this instance. 


REFERENCES 


1. Ahmann, J. Stanley. Prediction of Achieve- 
ment of Iowa State College Engineering 
Students Having Transferred from Other 
Institutions, Unpublished Ph. D. Disserta- 
tion, Iowa State College Library, Ames, 
Iowa, 1951. 

2. Berdie, R. Е. and Sutter, М. A. “Ргейісі- 
ing Success of Engineering Students, ’’ 
Journal of Educational Psychology, XLI 
(March 1950), pp. 184-190. 

3. Feder, D. D. and Adler, D. L. “Ргейісі- 


eco 


(Vol. 24 


ing the Scholastic Achievement of Engin- 
eering Students, ’’ Journal of Engineering 


Education, XXIX (January 1939), pp. 380- 
385. 


4. Gowan, A. M. Unique Characteristics of 
Freshman Veterans at the Iowa State Col- 
lege with Administrative Implications, Un- 
published Ph.D. Dissertation, Iowa State 
College Library, Ames, Iowa, 1947. 

5. McClanahan, W. R. andMorgan, D. H. “Use 
of Standard Tests in Counseling Engineer- 
ing Students in College, " Journal of Educa- 


tional Psychology, XXXIX (December 1948), 
pp. 491-501. 


6. MacRae, J. M. Usefulness of the Minneso- 
ta Personality Scale for Predicting Achieve- 
ment of Freshman Engineering Students, 
Unpublished M.S. Thesis, Iowa State Col- 
ege, Ames, Iowa, 1949. 

7. Minor, W. T. Usefulness of the Kuder Pref- 
erence Record for Predicting Academic 
Success of Iowa State College Engineering 
Freshmen, Unpublished M.S. Thesis, Iowa 
State College Library, Ames, Iowa, 1947. 

8. Moore, J. E. ‘‘A Decade of Attempts to Pre- 
dict Scholastic Success in Engineering 
Schools, ” Occupations, XXVIII (November 
1949), pp. 92-96. 

9. Pierson, С. A.Jr. ‘School Marks and Suc- 
cess in Engineering,” Educational and Psy- 
chological Measurements, VII (Autumn 
1947), рр. 612-017. 

10. Treumann, М. J. and Sullivan, B. A. “Use 
of the Engineering and Physical Science 
Aptitude Test as a Predictor of Academic 
Achievement of Freshmen Engineering Stu- 
dents,'' Journal of Educational Resea rch, 
XLIII (October 1949), pp. 120-133. 

11. Vaughn, K. W. ‘‘The Yale Scholastic Apti- 
tude Tests as Predictors of Success in Col- 
lege of Engineering, ” Journal of Engineer- 
ing Education, XXXIV (April 1944 ), pP- 
572-582. 

12. Wert, James, E. and others. Statistical 
Methods in Education and Psychology (New 
York: Appleton-Century- Crofts, Inc., 
1954), Chs. 15 and 19. 


ne 


COLLEGE LEVEL STUDY SKILLS PROGRAMS: 
SOME OBSERVATIONS 


WALTER S. BLAKE, Jr. 
University of Maryland 


COLLEGE-LEVEL study skills programsare 
becoming more numerous. Twenty-four institu- 
tions are Planning such programs for the near 
future, Institutions of higher learning are en- 
uw anywhere from seven to 1400 students in 
mone Programs in the United States and Posses- 
haya ап all programs in which evaluations 

ein undertaken report favorablé results. 
ble «Top, most of the programs seem to resem- 
without am Somewhat—they just ‘‘growed ир”? 
by € benefit of the experiences ofothers 

y Virtue of the fact that the experiences of oth- 
ers in this field have not been reported in the lit- 
erature in any appreciable measure. 
Ν The University of Maryland began a program 
in 1947, and it, too, grew out of experimenta- 
tion at Maryland largely rather than as a result 
of the experiences of workersinother programs. 
However, a study was undertaken in 1953 to sur- 
vey and evaluate both the program at the Univer- 
Sity of Maryland and other programs in opera- 
tion throughout the United States and Posses- 
Sions in order to improve the program atthe Un- 
lversity of Maryland in the light of the findings.* 
The workers in the University of Maryland pro- 


gram feel that at least part of what they found 
Out could benefit workers in 4 


Sented in the hope that the many program work- 


ers and their Students will find the information 
useful to them. 


Segm Most programs offer services to a limited 
€nt of the school population. Fo rty-two 

err two-tenths percent admit voluntary and ref- 
al students (probationers, etc.); 40% admit 


. πο Voluntary students, and 11. 1% require all 


st €n to enroll (with a few taking voluntary 
Udents as well). Six percent did not report in 
this area. The wide variation of admission pol- 
lcies is Surprising since the consensus is that 
any study skills program is composed of guid- 
ance services which should be available to the 
entire Student body if the program is to attain 
its Sreatest effectiveness. All entering fresh- 
men should be assigned to a program designed 
to indoctrinate them to the life on campus plus 
he minimal skills needed to achieve their goals 
o 


0 


at college and afterward; and the servicesofthe 
program (tutorial, remedial reading, study 
Skills and reading courses, counseling etc.,) 
Should be open to all students on campus who 
feel a need for such services. 

2. The “remedial” aura still surrounds and 
plagues study skills programs, in general. The 
remedial phase(s) of most programs take pre- 
cedence over the preventative phases, with the 
result that very few schools make provisions 
for helping any students other than those who 
must be helped. The “average” student is 
obliged to struggle along without assistance un- 
til he, or some faculty member, notices that he 
is about to fail out of college, at which time ‘‘re- 
medial" measures may be taken (if it is not al- 
ready too late). In most institutions where no 
required program for freshmen is offered, fac- 
ulty referrals and self-referrals are the only 
means available to help prevent academic fail- 
ure and social maladjustment. 

3. Program-planning with students is con- 
piculously lacking in many of the programs sur- 
veyed. Small staffs and insufficient operating 
funds usually account for this; yet the absence 
of student-faculty planning is a serious short- 
coming, nonetheless, in programs of this kind. 
The types and extent of services offered should 
be the result of student-faculty planning, based 
upon research findings. One way to helpinsure 
student participation in the program is to incor- 
porate student-faculty planning as a part of the 
program itself. Written student evaluations, 
soliciting student suggestions, interviews with 
students, consultation with student government 
leaders, and regularly scheduled student-faculty 
meetings are useful methods. The main point 
here is this: faculty-seen needs are not neces- 
sarily student-seen needs—a well-known fact 
often overlooked. It is recognized that a well- 
trained faculty might know more about what stu- 
dents need than the students themselves, yet this 
obviously does not guarantee student acceptance 
of a program planned entirely by faculty mem- 
bers. Student-faculty planning might well be 
termed a ‘‘calculated risk" in the study skills 
area; but it seems no less essential than 
in any other situation where democratic proced- 
ures seem likely to produce the best results. 


*This article is based upon a doctoral study completed by the author entitled: A Survey. and Evaluation 
ES g біліу Skills Programs at the College Level in the United States and Posses ions, University of Mary- 
5 е \ 


148 2 JOURNAL OF EXPERIMENTAL EDUCATION 


4. Research is being done neither inthe min- 
imal quantity necessary nor in the areas where 
it is most needed. The quantity of research 
needed will necessarily be governed by needsof 
individual programs; but every program needs 
research of the kind which will indicate (1) wheth- 
er the program is achieving set goals, and (2) 
what needs to be done to improve the program. 
While it is true that program workers spend 
most of their time giving service (as do most 
people in the various branches of the teaching 
profession), it is equally true thata part of 
every worker’s time needs to be devoted to re- 
seacch in the program if the program is to be 
successful, and if the workers are to have con- 
fidence in the program itself as well as their 
part in the program. Research is needed par- 
ticularly in these areas: program evaluation, 
program improvement, and validation of diag- 
nostic instruments. 

5. Over half (51. 1%) of the programs sur - 
veyed do not give academic credit for participa- 
tion in the formalized parts (classes in study 
skills and reading, mainly) of the programs. 
Credit is “ехресіей”” by college students for 
work done under the auspices of the institution 
out of habit and tradition. Good or bad, it is 
nonetheless true that college credit is a motivat- 
ing factor with college students—perhaps the 
most important single motivating factor. Itis 
also true that student initiative is important to 
any student’s success or failure in meeting or 
solving his problems. Therefore, it seems im- 
portant to make the process of problem-solving 
in any group guidance situation as profitable as 
possible to students in order to nurture initia- 
tive. Some workers who do not give academic 
credit feel that some of the services rendered 
and some of the course materials and techniques 
used are not ‘‘college level" in terms of the con- 
ventional college-level courses. While such may 
actually be the case in many programs, the fail- 
ure to grant some credit for work accomplished 
may doom good programs to ineffectuality, no 
matter how fine such programs may be potenti- 
ally. 

Т Study skills programs need workers 


(Vol. 24 


trained to work in study skills programs. At 
present, nearly all workers are educators, psy- 
chologists, or other kinds of specialists not nec- 
essarily trained to be workers in study skills 
programs. Workers having majored іп areas 
such as education and psychology might have 
some of the general qualifications needed (like 
the desire to work with students); but workers 
could have the special qualifications needed only 
by chance. For example, educators do not us- 
ually learn psychology in their curriculum, and 
psychologists do not learn teaching methods; 
yet both psychology and teaching methods are 
acknowledged to be two of the important special 
qualifications desirable for program workers by 
program workers themselves. Only one institu- 
tion, out of the many contacted in the survey, of- 
fers a training program specifically for study 
skills program workers, yet hundreds of per- 
sons are now employed in such programs, and 
24 institutions plan such programs for the future. 

7. Study skills programs are not publicized 
adequately, as a гше — indeed, some are kept 
on a ‘‘confidential’’ basis among staff members. 
The reticence on the part of the program work- 
ers to make their services known does a disserv- 
ice to the student body and also prevents Ше pro- 
grams from reaching their maximum level of ef- 
fectiveness. Evidence points to frugal financing 
of such programs as the basic reason for cur- 
tailed services as well as lack of publicity about 
services offered; but # seems certain that a 
program designed to help students cannot be kept 
secret from students and at the sametime serve 
their needs. The publicizing of programs need 
not be of the conventional advertising variety, of 
course; but the program should be made known 
to all students through written notices concern- 
ing services, hours, etc., articles in the cam- 
pus literature which will reach and be read by 
both students and faculty, and any oiher device 
available to workers. The students and faculty 
who have received satisfactory service provided 
by the program will, of course, be the best pub- 
licity mediums, once the program has been op- 
erating long enough to become known on the 
campus. 


